Map Reduce is a pattern that seems to get a lot of traction lately and I start to see it manifest in one of my projects that is focused on an event processing pipeline (iPhone Accelerometer and GPS data). I needed to built a lot of infrastructure for this project, in fact it overweighs the logic code interacting with it by 2x. Some of the components I built where EventProcessors (with in- and outputbuffers, timing etc.), EventListeners, Aggregators and a staged Pipeline.
This leads me to my question what the "common" required infrastrucutre for map reduce is. Since I am working with .Net a lot I can see map reduce infrastructure built into the Framework and language constructs. Functional languages support this paradigm per se. It seems every language can be used with map reduce. There are even languages built around that concept (e.g. Go).
Apache Hadoop brings Map-Reduce to Java. Google has patented a map-reduce framework. What kind of infrastructure do they provide to enable map reduce? What are the constructs exhibited in functional languages to implement map reduce? What needs/should a map-reduce framework provide?
MapReduce is an evolving programming framework for massive data applications proposed by Google. It is based on functional programming (Peyton Jones, 1987), where the designer defines map and reduce tasks to process large sets of distributed data.
MapReduce is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the heart of Apache Hadoop. The term "MapReduce" refers to two separate and distinct tasks that Hadoop programs perform.
MapReduce Architecture is a programming model and a software framework utilized for preparing enormous measures of data. MapReduce program works in two stages, to be specific, Map and Reduce. Map requests that arrange with mapping and splitting of data while Reduce tasks reduce and shuffle the data.
Other systems like CouchBase and MongoDB also used map-reduce as a query engine. Hadoop made a big splash, but these days it's pretty niche in practice. That's not because MapReduce itself is outdated, but rather because the problems that it solves are situations we now try to avoid.
Well Hadoop is based on the Google File System. The Hadoop MapReduce implementation is also based on a paper by Google. For both Google and Hadoop the component that allows MapReduce to sucessfully run over massive amounts of data in parallel is the distributed file system.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With