Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Clojure or Scala for bioinformatics/biostatistics/medical research [closed]

I am not a professional programmer (my area is medical research), but I am quite capable in C/C++, and various scripting languages. A while back I got intrigued by Lisp, but I never got the time to seriously learn it. After a brief exposure to R I decided to invest more time in a functional programming language.

I would like the practicality of a JVM language and thus narrowed to Clojure and Scala. From what I understand, both can use already existing Java libraries and given at performance-critical code can be delegated to Java, have the potential to perform relatively equally well.

How do these languages compare in the application space I need them for? Are There any real-life projects in bioinformatics using either?

Already existing code would be a serious plus, as would be good documentation and a fairly gentle learning curve. Also, how does the concurrency model of the two compare with each other?

Any significant advantages/disadvantages any one has?

like image 980
kliron Avatar asked Mar 09 '11 18:03

kliron


People also ask

Is Scala faster than Clojure?

Scala is faster than Clojure. The microservice development is efficient.

How the strengths of Lisp family languages facilitate building complex and flexible bioinformatics applications?

Put simply, Lisp-family languages enable programmers to more quickly write programs that run faster than in other languages. Languages such as Common Lisp, Scheme and Clojure facilitate the creation of powerful and flexible software that is required for complex and rapidly evolving domains like biology.


2 Answers

I can personally vouch for Clojure as a great tool for this kind of work. (I believe Scala would be great too, I just have less experience with it).

My personal research is in the field of predictive modelling / machine learning and is very computationally intensive - so I think it has many parallels with bioinformatics or biostatistics.

My personal approach / setup includes:

  • Incanter used primarily as a data visualisation tool. Great for producing quick visualisations which are usually just 1-liners at the REPL. There are also lots of statistical and numerical processing tools which I believe use the Colt library under the hood. I'm not an expert in R but I understand that Incanter is roughly "R translated to Clojure/Lisp".

  • Exploiting quite a few Java libraries as needed. Some of these are my own, for example algorithms that I have written in Java in order to get the best possible fine-tuned performance out of the JVM. But you could equally easily use any of the other great Java libraries available, as calling Java from Clojure is very simple (.methodName object param1 param2)

  • Quite a lot of higher order functions to automate my workflow. For example I have a higher order function that will run an optimisation algorithm of any kind in a loop for a specified amount of time and then produce an Incanter graph of the improvement on each iteration. Not rocket science, but really easy to code up in a few lines of Clojure.

  • Never really having to worry about performance. You can make Clojure go pretty fast if you want to (e.g. with type hints, primitive arithmetic support etc.) but normally it's irrelevant as you're going to spend 99%+ of your cycles in well-optimised library code anyway. Hence a bit of overhead in the "glue" code is negligible - I feel I gain much more in terms of personal productivity by having a dynamic, high-level, functional language to work in.

  • Major use of Clojure's concurrency features - this has to be one of Clojure's strongest features. I tend to use the STM to code concurrent processes with transactions that can't interfere with each other, then kick off long-running calculations in a future so that I can get on with other tasks and wait for notification of the result.

  • A slowly growing collection of macros to "extend the language" when needed. I actually use macros less than I thought I would (higher order functions are often a better choice). But when you need them they are invaluable - this is where you really appreciate the value of a homoiconic language. Since they effectively allow you to add new syntax to the language itself, they are very powerful when used correctly to build the DSL that you need.

In short - I don't think you can go wrong with Clojure as a researcher.

The one thing I probably wouldn't use it for (yet) is actually writing a new numerical library - this would probably be better done in Scala or pure Java as you would probably want to adopt a more imperative / OOP style.

like image 170
mikera Avatar answered Sep 28 '22 00:09

mikera


I am not sure about bioinformatics and biostatistics per se, but I do scientific data analysis frequently and I appreciate that Scala allows me to write as-fast-as-Java code with relative ease. I believe that it is often possible in Clojure now, but I haven't seen the benchmarks to back that up. For the time being, I think the prudent thing to assume is that they do not perform equally well. See, for example, the Computer Languages Benchmark Game, where Scala is faster than Clojure in every single test. (Ignore the horrible "pidigits" result for Clojure--Scala (and Java) are calling the GMP library written in C, which Clojure could do but because of a technical detail requiring a different wrapping for the library, isn't presently allowed in the game). Looking at multicore comparisons doesn't improve Clojure's showing, and note that the Clojure code is no shorter for these sorts of lowish-level algorithmic tasks.

Clojure is ahead for the time being with parallel collections, though the upcoming 2.9 release of Scala should make up much of the difference. Neither has a gentle learning curve when coming from C++; Scala is maybe a little easier given that the syntax outwardly looks a little more familiar. I believe there are good materials for learning each of them.


Edit: P.S. You can call R from Java (and therefore from either Clojure or Scala) using rJava (specifically the JRI interface). Edit to edit: and, these days, rScala.

Edit #2: Scala was faster than Clojure in everything at the time of writing; as of this edit, Clojure's a little ahead in one (at the cost of a huge amount of code)--but anyway, the overall point stands. (And the Scala implementation on that one test could be sped up.)

like image 39
Rex Kerr Avatar answered Sep 28 '22 02:09

Rex Kerr