I have a few problems which may apply well to the Map-Reduce model. I'd like to experiment with implementing them, but at this stage I don't want to go to the trouble of installing a heavyweight system like Hadoop or Disco.
Is there a lightweight Python framework for map-reduce which uses the regular filesystem for input, temporary files, and output?
MapReduce is written in Java but capable of running g in different languages such as Ruby, Python, and C++. Here we are going to use Python with the MR job package. We will count the number of reviews for each rating(1,2,3,4,5) in the dataset. Step 1: Transform raw data into key/value pairs in parallel.
Hadoop framework is written in Java language; however, Hadoop programs can be coded in Python or C++ language.
MapReduce programs are usually written in Java; however they can also be coded in languages such as C++, Perl, Python, Ruby, R, etc. These programs may process data stored in different file and database systems.
http://pythonhosted.org/mrjob/ is great to quickly get started on your local machine, basically all you need is a simple:
pip install mrjob
A Coursera course dedicated to big data suggests using these lightweight python Map-Reduce frameworks:
To get you started very quickly, try this example:
https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2
(hint: for [server address] in this example use localhost)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With