I'm Hadoop engineer with primary interest in machine learning and data mining. With data locality utilization and modern tools like Spark (and especially MLlib) analysing terabytes of data becomes easy and pleasurable. So far I'm using Python API to Spark (PySpark) and am pretty satisfied with it.
However recently new strong player in scientific computations appeared - Julia. With its JIT compilation and built-in parallelism (among other things) it may become good competitor to traditional tools. So I'm interested, if I switch to Julia at some point, what are my options for using it on top of existing Hadoop stack? Are there any bindings or bridges allowing to run Julia scripts and still utilize HDFS's data locality?
EDIT. To make it clear: I'm not asking what tools are the best, not comparing Julia (or Hadoop) to other tools and not promoting any computational stack. My question is about projects that may help in integrating two technologies. No opinions, no long deliberation - just links to projects and short description.
edit: I should also point out the JavaCall package, which may allow utilization of existing Java libraries in this area. https://github.com/aviks/JavaCall.jl
(edit: originally linked to a now-deprecated HDFS binding project also by the Elly developer: https://github.com/tanmaykm/HDFS.jl)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With