Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is MapReduce well suited for solving problems in a single-machine multiple-core in-memory environment?

Does the MapReduce abstraction a good one for dealing with problems even in a single machine? For example, I have a 12-core machine and I have to count words in thousands of files (classic MapReduce example).

Using a MapReduce implementation with Mappers and Reducers in multiple threads is a good way to solve this problem, considering that we're working on a single machine with a single Hard-drive?

I guess my question comes down to this: Is the MapReduce paradigm good only for working in a cluster of machines?

like image 549
Felipe Hummel Avatar asked Jun 24 '11 20:06

Felipe Hummel


People also ask

What is MapReduce used for?

MapReduce serves two essential functions: it filters and parcels out work to various nodes within the cluster or map, a function sometimes referred to as the mapper, and it organizes and reduces the results from each node into a cohesive answer to a query, referred to as the reducer.

Where is MapReduce used?

MapReduce is a programming model or pattern within the Hadoop framework that is used to access big data stored in the Hadoop File System (HDFS). It is a core component, integral to the functioning of the Hadoop framework.

For what kind of workloads is MapReduce good for and why?

The main benefit of MapReduce is that users can scale data processing easily over several computing nodes. The data processing primitives used in the MapReduce model are mappers and reducers. Sometimes it is difficult to divide a data processing application into mappers and reducers.

Which of the following is true about MapReduce?

This is Expert Verified AnswerMapReduce is a programming framework that allows us to perform distributed and parallel processing on large data sets in a distributed environment.


1 Answers

In general you can have two situations:

  1. Your problem is small enough to fit into the memory of your single system and your single system has enough CPU power to solve the problem within the required time.
  2. Your problem is too big. 2.1 Running time is too big (disk IO and/or CPU time) 2.2 Too big to fit into memory (RAM).

For 2.1 and 2.2 the MapReduce paradigm helps a lot in splitting the work into many smaller chunks. If you need more CPU you simply add CPUs.

So if you have a single system and it turns out your problem is too big to fit into memory (point 2.2) you can still benefit from the fact that MapReduce can easily put a part of the problem on disk until that part is to be processed.

The important fact is that if you have a problem that is small enough to fit into memory and small enough to be processed on a single system then a dedicated (non-MapReduce) solution can be a lot faster.

like image 140
Niels Basjes Avatar answered Sep 30 '22 13:09

Niels Basjes