Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between #encode and #force_encoding in ruby?

Tags:

ruby

encoding

I really do not understand the difference between #encode and #force_encoding in Ruby for the String class. I understand that "kam".force_encoding("UTF-8") will force "kam" to be in UTF-8 encoding, but how is #encode(encoding) different?

http://ruby-doc.org/core-2.0/String.html#method-i-encoding

like image 694
Kamilski81 Avatar asked Feb 06 '14 21:02

Kamilski81


2 Answers

Difference is pretty big. force_encoding sets given string encoding, but does not change the string itself, i.e. does not change it representation in memory:

'łał'.bytes #=> [197, 130, 97, 197, 130] 'łał'.force_encoding('ASCII').bytes #=> [197, 130, 97, 197, 130] 'łał'.force_encoding('ASCII')   #=> "\xC5\x82a\xC5\x82" 

encode assumes that the current encoding is correct and tries to change the string so it reads same way in second encoding:

'łał'.encode('UTF-16') #=> 'łał' 'łał'.encode('UTF-16').bytes #=> [254, 255, 1, 65, 0, 97, 1, 66]  

In short, force_encoding changes the way string is being read from bytes, and encode changes the way string is written without changing the output (if possible)

like image 138
BroiSatse Avatar answered Sep 24 '22 06:09

BroiSatse


Read this Changing an encoding

The associated Encoding of a String can be changed in two different ways.

First, it is possible to set the Encoding of a string to a new Encoding without changing the internal byte representation of the string, with String#force_encoding. This is how you can tell Ruby the correct encoding of a string.

Example :

string = "R\xC3\xA9sum\xC3\xA9" string.encoding #=> #<Encoding:ISO-8859-1> string.force_encoding(Encoding::UTF_8) #=> "R\u00E9sum\u00E9" 

Second, it is possible to transcode a string, i.e. translate its internal byte representation to another encoding. Its associated encoding is also set to the other encoding. See String#encode for the various forms of transcoding, and the Encoding::Converter class for additional control over the transcoding process.

Example :

string = "R\u00E9sum\u00E9" string.encoding #=> #<Encoding:UTF-8> string = string.encode!(Encoding::ISO_8859_1) #=> "R\xE9sum\xE9" string.encoding #=> #<Encoding::ISO-8859-1> 
like image 30
Arup Rakshit Avatar answered Sep 23 '22 06:09

Arup Rakshit