Difference between Text and String in Hadoop

1 Answers

The binary representation of a Text object is a variable length integer containing the number of bytes in the UTF-8 representation of the string, followed by the UTF-8 bytes themselves.

Text is a replacement for the UTF8 class, which was deprecated because it didn’t support strings whose encoding was over 32,767 bytes, and because it used Java’s modified UTF-8.

Furthermore, Text uses standard UTF-8, which makes it potentially easier to inter operate with other tools that understand UTF-8.

Following are some of the differences in brief related to its functioning with respect to String:

Indexing: Because of its emphasis on using standard UTF-8, there are some differences between Text and the Java String class. Indexing for the Text class is in terms of position in the encoded byte sequence, not the Unicode character in the string, or the Java char code unit (as it is for String).

For instance, charAt() returns an int representing a Unicode code point, unlike the String variant that returns a char.

Iteration: Iterating over the Unicode characters in Text is complicated by the use of byte offsets for indexing, since you can’t just increment the index.

Mutable: Another difference with String is that Text is mutable (like all Writable implementations in Hadoop, except NullWritable, which is a singleton). You can reuse a Text instance by calling one of the set()methods on it.

Resorting to String:

Text doesn’t have as rich an API for manipulating strings as java.lang.String, so in many cases, you need to convert the Text object to a String. This is done in the usual way, using the toString() method:

For more details read definitive guide.

136

answered Sep 22 '22 07:09

SSaikia_JtheRocker

Related questions
                            
                                How portable is using the low bit of a pointer as a flag?
                            
                                Dynamically created method and decorator, got error 'functools.partial' object has no attribute '__module__'
                            
                                How to get current path of the file that run the script in nodejs
                            
                                Javascript load vs ready vs domready vs DOMContentLoaded events
                            
                                The connection to adb is down, and a severe error has occured.You must restart adb and Eclipse.Please ensure that adb is correctly located
                            
                                How to ask for confirmation from User before running a build in Jenkins?
                            
                                unique_ptr & vector, trying to access deleted function, Visual Studio 2013
                            
                                Chrome extension Content Security Policy directive error
                            
                                What count as a Parse request?
                            
                                Java HashSet equivalent in C++
                            
                                How can I use my Web Api project from other projects inside my solution?
                            
                                Why does rubocop or the ruby style guide prefer not to use get_ or set_?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Difference between Text and String in Hadoop

Tags:

Lokesh

People also ask

1 Answers

SSaikia_JtheRocker

Recent Activity

Donate For Us