I have to deal with mainly English alphabets and all the punctuation marks, I don't have to worry about European accents. So the only concern I have is when a user paste something he copies from the web that includes, for instance, an apostrophe that when I do a puts in the console (on Win7), it outputs
"ItΓÇÖs" # where as it actually is " It's "
So my main question is, is there a end-it-all conversion method I can use in Ruby that just properly replaces all the ,.;?!"'~` _- with ASCII counter parts?
I really understand very little about encodings, if you think this is wrong question to ask, which can very likely be the case, please do advice as to what I should look for instead.
Thank you
By default, the three primary ones used are UTF-8, US-ASCII, and ASCII-8BIT (aliased as BINARY). The encoding associated with a string can be changed with or without validation. It is possible to create a string with an underlying byte sequence that is invalid in the associated encoding.
encoding is a String class method in Ruby which is used to return the Encoding object that represents the encoding of object. Syntax: str.encoding. Parameters: Here, str is the given string. Returns: An encoding object.
ASCII is an 8-bit code. That is, it uses eight bits to represent a letter or a punctuation mark. Eight bits are called a byte. A binary code with eight digits, such as 1101 10112, can be stored in one byte of computer memory.
I work in publishing where we deal with this a lot. We have had success with stringex https://github.com/rsl/stringex. They have a to_ascii method that normalizes unicode dashes etc.
And in ruby 2.0:
"ItΓÇÖs".encode("ASCII", invalid: :replace, undef: :replace, replace: '')
=> "Its"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With