Why does Hadoop need to introduce these new classes? They just seem to complicate the interface

In order to handle the Objects in Hadoop way. For example, hadoop uses <code>Text</code> instead of java's <code>String</code>. The <code>Text</code> class in hadoop is similar to a java <code>String</code>, however, <code>Text</code> implements interfaces like <code>Comparable</code>, <code>Writable</code> and <code>WritableComparable</code>. These interfaces are all necessary for MapReduce; the <code>Comparable</code> interface is used for comparing when the reducer sorts the keys, and <code>Writable</code> can write the result to the local disk. It does not use the java <code>Serializable</code> because java <code>Serializable</code> is too big or too heavy for hadoop, <code>Writable</code> can serializable the hadoop Object in a very light way.

Why does Hadoop need classes like Text or IntWritable instead of String or Integer?

1 Answers

In order to handle the Objects in Hadoop way. For example, hadoop uses Text instead of java's String. The Text class in hadoop is similar to a java String, however, Text implements interfaces like Comparable, Writable and WritableComparable.

These interfaces are all necessary for MapReduce; the Comparable interface is used for comparing when the reducer sorts the keys, and Writable can write the result to the local disk. It does not use the java Serializable because java Serializable is too big or too heavy for hadoop, Writable can serializable the hadoop Object in a very light way.

180

answered Oct 22 '22 01:10

Winston

Related questions
                            
                                Spark-submit not working when application jar is in hdfs
                            
                                Hadoop: Connecting to ResourceManager failed
                            
                                How can I force Spark to execute code?
                            
                                Is there a hdfs command to list files in HDFS directory as per timestamp
                            
                                Primary keys with Apache Spark
                            
                                How to write to CSV in Spark
                            
                                There are 0 datanode(s) running and no node(s) are excluded in this operation
                            
                                How can I access S3/S3n from a local Hadoop 2.6 installation?
                            
                                Do exit codes and exit statuses mean anything in spark?
                            
                                How to list only the file names in HDFS
                            
                                How to specify username when putting files on HDFS from a remote machine?
                            
                                What exactly is hadoop namenode formatting?
                            
                                How to know what is the reason for ClosedChannelExceptions with spark-shell in YARN client mode?
                            
                                what is HiveServer and Thrift server [closed]
                            
                                Sorting large data using MapReduce/Hadoop
                            
                                Hadoop: ...be replicated to 0 nodes instead of minReplication (=1). There are 1 datanode(s) running and no node(s) are excluded in this operation
                            
                                Apache Pig: FLATTEN and parallel execution of reducers
                            
                                what is difference between partition and replica of a topic in kafka cluster
                            
                                Skip first line of csv while loading in hive table
                            
                                Running Apache Hadoop 2.1.0 on Windows

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Why does Hadoop need classes like Text or IntWritable instead of String or Integer?

Tags:

hadoop

Casebash

People also ask

1 Answers

Winston

Recent Activity

Donate For Us