Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ruby from any encoding to ascii

Tags:

ruby

encoding

I have to deal with mainly English alphabets and all the punctuation marks, I don't have to worry about European accents. So the only concern I have is when a user paste something he copies from the web that includes, for instance, an apostrophe that when I do a puts in the console (on Win7), it outputs

"ItΓÇÖs" # where as it actually is " It's "

So my main question is, is there a end-it-all conversion method I can use in Ruby that just properly replaces all the ,.;?!"'~` _- with ASCII counter parts?

I really understand very little about encodings, if you think this is wrong question to ask, which can very likely be the case, please do advice as to what I should look for instead.

Thank you

like image 288
Nik So Avatar asked Feb 22 '11 21:02

Nik So


People also ask

Does Ruby use Ascii?

By default, the three primary ones used are UTF-8, US-ASCII, and ASCII-8BIT (aliased as BINARY). The encoding associated with a string can be changed with or without validation. It is possible to create a string with an underlying byte sequence that is invalid in the associated encoding.

What is Ruby encoding?

encoding is a String class method in Ruby which is used to return the Encoding object that represents the encoding of object. Syntax: str.encoding. Parameters: Here, str is the given string. Returns: An encoding object.

Why ASCII is 8 BIT?

ASCII is an 8-bit code. That is, it uses eight bits to represent a letter or a punctuation mark. Eight bits are called a byte. A binary code with eight digits, such as 1101 10112, can be stored in one byte of computer memory.


2 Answers

I work in publishing where we deal with this a lot. We have had success with stringex https://github.com/rsl/stringex. They have a to_ascii method that normalizes unicode dashes etc.

like image 59
Michael Papile Avatar answered Oct 05 '22 07:10

Michael Papile


And in ruby 2.0:

"ItΓÇÖs".encode("ASCII", invalid: :replace, undef: :replace, replace: '')
 => "Its" 
like image 38
fotanus Avatar answered Oct 05 '22 07:10

fotanus