What is the principle of "code moving to data" rather than data to code?

Tags:

In a recent discussion about distributed processing and streaming I came across the concept of 'code moving to data'. Can someone please help explaining the same. Reference for this phrase is MapReduceWay.

In terms of Hadoop, it's stated in a question but still could not figure out an explanation of the principle in a tech agnostic way.

502

asked Nov 15 '16 04:11

Sankalp

1 Answers

The basic idea is easy: if code and data are on different machines, one of them must be moved to the other machine before the code can be executed on the data. If the code is smaller than the data, better to send the code to the machine holding the data than the other way around, if all the machines are equally fast and code-compatible. [Arguably you can send the source and JIT compile as needed].

In the world of Big Data, the code is almost always smaller than the data.

On many supercomputers, the data is partitioned across many nodes, and all the code for the entire application is replicated on all nodes, precisely because the entire application is small compared to even the locally stored data. Then any node can run the part of the program that applies to the data it holds. No need to send the code on demand.

answered Oct 04 '22 00:10

Ira Baxter

Related questions
                            
                                Pydoop on Amazon EMR
                            
                                how can i work with large number of small files in hadoop?
                            
                                Change Block size of existing files in Hadoop
                            
                                Change Hive Database location
                            
                                How to use the ResourceManager web interface as an user
                            
                                Why does YARN job not transition to RUNNING state?
                            
                                AWS connection timeout when running Spark job on EMR
                            
                                JAVA_HOME not setting
                            
                                Mapper and Reducer for K means algorithm in Hadoop in Java
                            
                                Launch a mapreduce job from eclipse
                            
                                Where are my files(dir) stored when i used the hadoop fs -mkdir?
                            
                                HDFS File Checksum
                            
                                How to convert String object to IntWritable Object in Hadoop
                            
                                package org.apache.hadoop.conf does not exist after setting classpath
                            
                                unable to create hive table with primary key
                            
                                HADOOP / YARN - Are the ResourceManager and the hdfs NameNode always installed on the same host?
                            
                                Hive query stuck at 99%
                            
                                What is the difference between Statement.setMaxRows vs Statement.setFetchsize in Hive
                            
                                Different ways to import files into HDFS
                            
                                How many types of InputFormat is there in Hadoop?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the principle of "code moving to data" rather than data to code?

Tags:

architecture

design-principles

distributed-computing

hadoop

mapreduce

Sankalp

People also ask

1 Answers

Ira Baxter

Recent Activity

Donate For Us