Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there a simple Python map-reduce framework that uses the regular filesystem?

I have a few problems which may apply well to the Map-Reduce model. I'd like to experiment with implementing them, but at this stage I don't want to go to the trouble of installing a heavyweight system like Hadoop or Disco.

Is there a lightweight Python framework for map-reduce which uses the regular filesystem for input, temporary files, and output?

like image 980
Reid Avatar asked Apr 18 '13 21:04

Reid


People also ask

Can MapReduce be written in Python?

MapReduce is written in Java but capable of running g in different languages such as Ruby, Python, and C++. Here we are going to use Python with the MR job package. We will count the number of reviews for each rating(1,2,3,4,5) in the dataset. Step 1: Transform raw data into key/value pairs in parallel.

Can we write Hadoop in Python?

Hadoop framework is written in Java language; however, Hadoop programs can be coded in Python or C++ language.

Which programming language has been used by MapReduce framework?

MapReduce programs are usually written in Java; however they can also be coded in languages such as C++, Perl, Python, Ruby, R, etc. These programs may process data stored in different file and database systems.


2 Answers

http://pythonhosted.org/mrjob/ is great to quickly get started on your local machine, basically all you need is a simple:

pip install mrjob

like image 63
gterzian Avatar answered Nov 15 '22 21:11

gterzian


A Coursera course dedicated to big data suggests using these lightweight python Map-Reduce frameworks:

  • http://code.google.com/p/octopy/
  • https://github.com/michaelfairley/mincemeatpy

To get you started very quickly, try this example:

https://github.com/michaelfairley/mincemeatpy/zipball/v0.1.2

(hint: for [server address] in this example use localhost)

like image 35
Pavel Avatar answered Nov 15 '22 21:11

Pavel