Julia on Hadoop? [closed]

Question

I'm Hadoop engineer with primary interest in machine learning and data mining. With data locality utilization and modern tools like Spark (and especially MLlib) analysing terabytes of data becomes easy and pleasurable. So far I'm using Python API to Spark (PySpark) and am pretty satisfied with it.

However recently new strong player in scientific computations appeared - Julia. With its JIT compilation and built-in parallelism (among other things) it may become good competitor to traditional tools. So I'm interested, if I switch to Julia at some point, what are my options for using it on top of existing Hadoop stack? Are there any bindings or bridges allowing to run Julia scripts and still utilize HDFS's data locality?

EDIT. To make it clear: I'm not asking what tools are the best, not comparing Julia (or Hadoop) to other tools and not promoting any computational stack. My question is about projects that may help in integrating two technologies. No opinions, no long deliberation - just links to projects and short description.

Isaiah Norton · Accepted Answer

Elly.jl is a "Hadoop HDFS and Yarn client"
the start of a Spark implementation: https://github.com/d9w/Spark.jl

edit: I should also point out the JavaCall package, which may allow utilization of existing Java libraries in this area. https://github.com/aviks/JavaCall.jl

(edit: originally linked to a now-deprecated HDFS binding project also by the Elly developer: https://github.com/tanmaykm/HDFS.jl)

Julia on Hadoop? [closed]

Tags:

apache-spark

hadoop

julia

ffriend

1 Answers

Isaiah Norton

Recent Activity

Donate For Us

Julia on Hadoop? [closed]

Tags:

apache-spark

hadoop

julia

ffriend

1 Answers

Isaiah Norton

Related questions

Recent Activity

Donate For Us