We are starting a big-data based analytic project and we are considering to adopt scala (typesafe stack). I would like to know the various scala API's/projects which are available to do hadoop , map reduce programs.
Several of the Hadoop's high-performance data frameworks are written in Scala or Java. The main reason for using Scala in these environments is due to its amazing concurrency support, which is the key in parallelizing processing of the large data sets.
Scala isn't a processing engine (as are both Hadoop and Spark) but rather a language that is used in data processing, distributed computing, and web development.
The main difference between Spark and Scala is that the Apache Spark is a cluster computing framework designed for fast Hadoop computation while the Scala is a general-purpose programming language that supports functional and object-oriented programming.
Definitely check out Scalding. Speaking as a user and occasional contributor, I've found it to be a very useful tool. The Scalding API is also meant to be very compatible with the standard Scala collections API. Just as you can call flatMap, map, or groupBy on normal collections, you can do the same on scalding Pipes, which you can imagine as a distributed List of tuples. There's also a typed version of the API which provides stronger type-safety guarantees. I haven't used Scoobi, but the API seems similar to what they have.
Additionally, there are a few other benefits:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With