Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Apache Spark or Cascading framework? [closed]

I am confused as to when to use the Cascading framework and when to use Apache Spark. What are suitable use cases for each one?

Any help is appreciated.

like image 736
progrrammer Avatar asked Aug 11 '14 10:08

progrrammer


1 Answers

At heart, Cascading is a higher-level API on top of execution engines like MapReduce. It is analogous to Apache Crunch in this sense. Cascading has a few other related projects, like a Scala version (Scalding), and PMML scoring (Pattern).

Apache Spark is similar in the sense that it exposes a high-level API for data pipelines, and one that is available in Java and Scala.

It's more of an execution engine itself, than a layer on top of one. It has a number of associated projects, like MLlib, Streaming, GraphX, for ML, stream processing, graph computations.

Overall I find Spark a lot more interesting today, but they're not exactly for the same thing.

like image 180
Sean Owen Avatar answered Oct 02 '22 23:10

Sean Owen