Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What type of problems can mapreduce solve?

Is there a theoretical analysis available which describes what kind of problems mapreduce can solve?

like image 392
amit-agrawal Avatar asked Apr 01 '09 12:04

amit-agrawal


People also ask

What kind of problems can MapReduce solve?

MapReduce was designed to solve one problem - indexing the internet. MapReduce can be used to solve any other problem similar to this - so any problem where you first decompose an input, and then aggregate (or reduce) it somehow to find your answer. It is optimal for very little else.

What are the issues that are tackled solved in order for MapReduce to work?

The identified MapReduce challenges are grouped into four main categories corresponding to Big Data tasks types: data storage, analytics, online processing, security and privacy.

When should MapReduce be used?

MapReduce is suitable for iterative computation involving large quantities of data requiring parallel processing. It represents a data flow rather than a procedure. It's also suitable for large-scale graph analysis; in fact, MapReduce was originally developed for determining PageRank of web documents.

What is MapReduce what are its benefits?

MapReduce in simple terms can be explained as a programming model that allows the scalability of multiple servers in a Hadoop cluster. It can be used to write applications to process huge amounts of data in parallel on clusters of commodity hardware.


2 Answers

In Map-Reduce for Machine Learning on Multicore Chu et al describe "algorithms that fit the Statistical Query model can be written in a certain “summation form,” which allows them to be easily parallelized on multicore computers." They specifically implement 10 algorithms including e.g. weighted linear regression, k-Means, Naive Bayes, and SVM, using a map-reduce framework.

The Apache Mahout project has released a recent Hadoop (Java) implementation of some methods based on the ideas from this paper.

like image 72
bubaker Avatar answered Oct 07 '22 12:10

bubaker


For problems requiring processing and generating large data sets. Say running an interest generation query over all accounts a bank hold. Say processing audit data for all transactions that happened in the past year in a bank. The best use case is from Google - generating search index for google search engine.

like image 36
sangupta Avatar answered Oct 07 '22 13:10

sangupta