I'm trying to understand the boundaries of hadoop and map/reduce and it would help to know a non-trivial problem, or class of problems, that we know map/reduce can't assist in. It certainly would be interesting if changing one factor of the problem would allow simplification from map/reduce. Thank you

Two things come to mind: <ol> <li>Anything that requires real-time / interactive / low latency response times. There is a fixed cost incurred for any job submitted to Hadoop.</li> <li>Any problem that is not embarrassingly parallel. Hadoop can handle a lot of problems that require some simple interdependency between data, since records are joined during the reduce phase. However, certain graph processing and machine learning algorithms are difficult to write in Hadoop because there are too many operations that are dependent on one another. Some machine learning algorithms require very low latency, random access to a large set of data, which Hadoop does not provide out of the box. </li> </ol>

Is there a canonical problem that provably can't be aided with map/reduce?

1 Answers

Two things come to mind:

Anything that requires real-time / interactive / low latency response times. There is a fixed cost incurred for any job submitted to Hadoop.
Any problem that is not embarrassingly parallel. Hadoop can handle a lot of problems that require some simple interdependency between data, since records are joined during the reduce phase. However, certain graph processing and machine learning algorithms are difficult to write in Hadoop because there are too many operations that are dependent on one another. Some machine learning algorithms require very low latency, random access to a large set of data, which Hadoop does not provide out of the box.

126

answered Nov 15 '22 08:11

bajafresh4life

Related questions
                            
                                how to expend array values in rows!! using Hive SQL
                            
                                Hadoop partitioner
                            
                                Performance issue in hive version 0.13.1
                            
                                Hadoop backup and recovery tool and guidance
                            
                                How to insert JSON in HDFS using Flume correctly
                            
                                Connecting to Hive via Beeline using Kerberos keytab
                            
                                Is there a way to change the replication factor of RDDs in Spark?
                            
                                Add Yarn cluster configuration to Spark application
                            
                                Hadoop in the AWS free tier?
                            
                                How to read parquet files using `ssc.fileStream()`? What are the types passed to `ssc.fileStream()`?
                            
                                How can I read in a binary file from hdfs into a Spark dataframe?
                            
                                Install Spark on an existing Hadoop cluster
                            
                                Distributed alternatives to hadoop
                            
                                Conditional join in Hive
                            
                                Oozie coordinator action rerun from fail nodes
                            
                                How to configure monopolistic FIFO application queue in YARN?
                            
                                pickle.PicklingError: args[0] from __newobj__ args has the wrong class with hadoop python
                            
                                How to set range for limit clause in hive
                            
                                Failed to start NameNode
                            
                                Spark: unable to load native-hadoop library for platform

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is there a canonical problem that provably can't be aided with map/reduce?

Tags:

hadoop

apache-pig

mapreduce

Steven Noble

People also ask

1 Answers

bajafresh4life

Recent Activity

Donate For Us