Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Julia on Hadoop? [closed]

I'm Hadoop engineer with primary interest in machine learning and data mining. With data locality utilization and modern tools like Spark (and especially MLlib) analysing terabytes of data becomes easy and pleasurable. So far I'm using Python API to Spark (PySpark) and am pretty satisfied with it.

However recently new strong player in scientific computations appeared - Julia. With its JIT compilation and built-in parallelism (among other things) it may become good competitor to traditional tools. So I'm interested, if I switch to Julia at some point, what are my options for using it on top of existing Hadoop stack? Are there any bindings or bridges allowing to run Julia scripts and still utilize HDFS's data locality?

EDIT. To make it clear: I'm not asking what tools are the best, not comparing Julia (or Hadoop) to other tools and not promoting any computational stack. My question is about projects that may help in integrating two technologies. No opinions, no long deliberation - just links to projects and short description.

like image 993
ffriend Avatar asked Jun 23 '14 22:06

ffriend


1 Answers

  • Elly.jl is a "Hadoop HDFS and Yarn client"
  • the start of a Spark implementation: https://github.com/d9w/Spark.jl

edit: I should also point out the JavaCall package, which may allow utilization of existing Java libraries in this area. https://github.com/aviks/JavaCall.jl

(edit: originally linked to a now-deprecated HDFS binding project also by the Elly developer: https://github.com/tanmaykm/HDFS.jl)

like image 154
Isaiah Norton Avatar answered Oct 31 '22 01:10

Isaiah Norton