Are there any alternative paradigms to MapReduce (Google, Hadoop)? Is there any other reasonable way how to split & merge big problems?
Apache Spark Hailed as the de-facto successor to the already popular Hadoop, Apache Spark is used as a computational engine for Hadoop data. Unlike Hadoop, Spark provides an increase in computational speed and offers full support for the various applications that the tool offers.
In 2021, MapReduce isn't a processing model that will turn many heads. It's been around for well over a decade, and is easy to write off as an idea that might've been groundbreaking in its time, but is dated today. MapReduce was a big deal for a reason, though, and it has a lot to offer, even now.
So when people say that Spark is replacing Hadoop, it actually means that big data professionals now prefer to use Apache Spark for processing the data instead of Hadoop MapReduce. MapReduce and Hadoop are not the same – MapReduce is just a component to process the data in Hadoop and so is Spark.
The use of JavaScript code with scope for the mapReduce functions has been deprecated since version 4.2.
Definitively. Check out, for example, Bulk Synchronous Parallel. Map/Reduce is in fact a very restricted way of reducing problems, however that restriction makes it manageable in a framework like Hadoop. The question is if it is less trouble to press your problem into a Map/Reduce setting, or if its easier to create a domain-specific parallelization scheme and having to take care of all the implementation details yourself. Pig, in fact, is only an abstraction layer on top of Hadoop which automates many standard problem transformations from not-Map-Reduce-y to Map-Reduce-compatible.
Edit 26.1.13: Found a nice up-to-date overview here
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With