Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the principle of "code moving to data" rather than data to code?

In a recent discussion about distributed processing and streaming I came across the concept of 'code moving to data'. Can someone please help explaining the same. Reference for this phrase is MapReduceWay.

In terms of Hadoop, it's stated in a question but still could not figure out an explanation of the principle in a tech agnostic way.

like image 502
Sankalp Avatar asked Nov 15 '16 04:11

Sankalp


People also ask

What is moving code to data?

Data locality is the concept of moving processing code to the data within your systems, instead of forcing huge data volumes through the network to get it processed.

What does the term data locality refer to?

Data locality means moving computation rather than moving data to save the bandwidth. This minimizes network congestion and increases the overall throughput of the system.


1 Answers

The basic idea is easy: if code and data are on different machines, one of them must be moved to the other machine before the code can be executed on the data. If the code is smaller than the data, better to send the code to the machine holding the data than the other way around, if all the machines are equally fast and code-compatible. [Arguably you can send the source and JIT compile as needed].

In the world of Big Data, the code is almost always smaller than the data.

On many supercomputers, the data is partitioned across many nodes, and all the code for the entire application is replicated on all nodes, precisely because the entire application is small compared to even the locally stored data. Then any node can run the part of the program that applies to the data it holds. No need to send the code on demand.

like image 53
Ira Baxter Avatar answered Oct 04 '22 00:10

Ira Baxter