Given a string A\xC3B
, it can be converted to utf-8 string by doing this (ref link):
"A\xC3B".force_encoding('iso-8859-1').encode('utf-8') #=> "AÃB"
However, I only want to perform the action if the string contains the ASCII code, namely \xC3
. How can I check for that?
Tried "A\xC3B".include?("\x")
but it doesn't work.
A simple browser-based utility that validates ASCII data. Just paste your ASCII text in the input area and you will instantly get the ASCII status in the output area. If the input contains only ASCII characters, you'll get a green badge, otherwise a red badge.
Approach: Start iterating through characters of the string and add their ASCII value to a variable. Finally, divide this sum of ASCII values of characters with the length of string i.e, the total number of characters in the string.
To check if a given String contains only unicode letters, digits or space, we use the isLetterOrDigit() and charAt() methods with decision making statements. The isLetterOrDigit(char ch) method determines whether the specific character (Unicode ch) is either a letter or a digit.
The best method to check the character in a String is the indexOf() method. It will return the index of the character present in the String, while contains() method only returns a boolean value indicating the presence or absence of the specified characters.
\x
is just a hexadecimal escape sequence. It has nothing to do with encodings on its own. US-ASCII goes from "\x00"
to "\x7F"
(e.g. "\x41"
is the same as "A"
, "\x30"
is "0"
). The rest ("\x80"
to "\xFF"
) however are not US-ASCII characters since it's a 7-bit character set.
If you want to check if a string contains only US-ASCII characters, call String#ascii_only?
:
p "A\xC3B".ascii_only? # => false
p "\x41BC".ascii_only? # => true
Another example based on your code:
str = "A\xC3B"
unless str.ascii_only?
str.force_encoding(Encoding::ISO_8859_1).encode!(Encoding::UTF_8)
end
p str.encoding # => #<Encoding:UTF-8>
I think what you want to do is to figure out whether your string is properly encoded. The ascii_only?
solution isn't much help when dealing with non-Ascii strings.
I would use String#valid_encoding?
to verify whether a string is properly encoded, even if it contains non-ASCII chars.
For example, what if someone else has encoded "Françoise Paré"
the right way, and when I decode it I get the right string instead of "Fran\xE7oise Par\xE9"
(which is what would be decoded if someone encoded it into ISO-8859-1).
[62] pry(main)> "Françoise Paré".encode("utf-8").valid_encoding?
=> true
[63] pry(main)> "Françoise Paré".encode("iso-8859-1")
=> "Fran\xE7oise Par\xE9"
# Note the encoding is still valid, it's just the way IRB displays
# ISO-8859-1
[64] pry(main)> "Françoise Paré".encode("iso-8859-1").valid_encoding?
=> true
# Now let's interpret our 8859 string as UTF-8. In the following
# line, the string bytes don't change, `force_encoding` just makes
# Ruby interpret those same bytes as UTF-8.
[65] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8")
=> "Fran\xE7oise Par\xE9"
# Is a lone \xE7 valid UTF-8? Nope.
[66] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8").valid_encoding?
=> false
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With