Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what are the options for hadoop on scala

Tags:

We are starting a big-data based analytic project and we are considering to adopt scala (typesafe stack). I would like to know the various scala API's/projects which are available to do hadoop , map reduce programs.

like image 848
prassee Avatar asked Jan 30 '13 04:01

prassee


People also ask

What is Scala used for in Hadoop?

Several of the Hadoop's high-performance data frameworks are written in Scala or Java. The main reason for using Scala in these environments is due to its amazing concurrency support, which is the key in parallelizing processing of the large data sets.

Is Hadoop and Scala same?

Scala isn't a processing engine (as are both Hadoop and Spark) but rather a language that is used in data processing, distributed computing, and web development.

What is Spark and Scala in Hadoop?

The main difference between Spark and Scala is that the Apache Spark is a cluster computing framework designed for fast Hadoop computation while the Scala is a general-purpose programming language that supports functional and object-oriented programming.


1 Answers

Definitely check out Scalding. Speaking as a user and occasional contributor, I've found it to be a very useful tool. The Scalding API is also meant to be very compatible with the standard Scala collections API. Just as you can call flatMap, map, or groupBy on normal collections, you can do the same on scalding Pipes, which you can imagine as a distributed List of tuples. There's also a typed version of the API which provides stronger type-safety guarantees. I haven't used Scoobi, but the API seems similar to what they have.

Additionally, there are a few other benefits:

  • Scalding is heavily used in production at Twitter and has been battle-tested on Twitter-scale datasets.
  • It has several active contributors both inside and outside Twitter that are committed to making it great.
  • It is interoperable with your existing Cascading jobs.
  • In addition to the Typed API, it has a a Fields API which may be more familiar to users of R and data-frame frameworks.
  • It provides a robust Matrix Library.
like image 84
arkajit Avatar answered Nov 13 '22 08:11

arkajit