Given a string <code>A\xC3B</code>, it can be converted to utf-8 string by doing this (ref link): <pre class="prettyprint"><code>"A\xC3B".force_encoding('iso-8859-1').encode('utf-8') #=> "AÃB" </code></pre> However, I only want to perform the action if the string contains the ASCII code, namely <code>\xC3</code>. How can I check for that? Tried <code>"A\xC3B".include?("\x")</code> but it doesn't work.

I think what you want to do is to figure out whether your string is properly encoded. The <code>ascii_only?</code> solution isn't much help when dealing with non-Ascii strings. I would use <code>String#valid_encoding?</code> to verify whether a string is properly encoded, even if it contains non-ASCII chars. For example, what if someone else has encoded <code>"Françoise Paré"</code> the right way, and when I decode it I get the right string instead of <code>"Fran\xE7oise Par\xE9"</code> (which is what would be decoded if someone encoded it into ISO-8859-1). <pre class="prettyprint"><code>[62] pry(main)> "Françoise Paré".encode("utf-8").valid_encoding? => true [63] pry(main)> "Françoise Paré".encode("iso-8859-1") => "Fran\xE7oise Par\xE9" # Note the encoding is still valid, it's just the way IRB displays # ISO-8859-1 [64] pry(main)> "Françoise Paré".encode("iso-8859-1").valid_encoding? => true # Now let's interpret our 8859 string as UTF-8. In the following # line, the string bytes don't change, `force_encoding` just makes # Ruby interpret those same bytes as UTF-8. [65] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8") => "Fran\xE7oise Par\xE9" # Is a lone \xE7 valid UTF-8? Nope. [66] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8").valid_encoding? => false </code></pre>

How to check if a string contains ASCII code

Tags:

string

ruby

utf-8

Given a string A\xC3B, it can be converted to utf-8 string by doing this (ref link):

"A\xC3B".force_encoding('iso-8859-1').encode('utf-8') #=> "AÃB"

However, I only want to perform the action if the string contains the ASCII code, namely \xC3. How can I check for that?

Tried "A\xC3B".include?("\x") but it doesn't work.

443

asked Jun 22 '15 21:06

sbs

2 Answers

\x is just a hexadecimal escape sequence. It has nothing to do with encodings on its own. US-ASCII goes from "\x00" to "\x7F" (e.g. "\x41" is the same as "A", "\x30" is "0"). The rest ("\x80" to "\xFF") however are not US-ASCII characters since it's a 7-bit character set.

If you want to check if a string contains only US-ASCII characters, call String#ascii_only?:

p "A\xC3B".ascii_only? # => false
p "\x41BC".ascii_only? # => true

Another example based on your code:

str = "A\xC3B"
unless str.ascii_only?
  str.force_encoding(Encoding::ISO_8859_1).encode!(Encoding::UTF_8)
end
p str.encoding # => #<Encoding:UTF-8>

143

answered Sep 25 '22 10:09

cremno

I think what you want to do is to figure out whether your string is properly encoded. The ascii_only? solution isn't much help when dealing with non-Ascii strings.

I would use String#valid_encoding? to verify whether a string is properly encoded, even if it contains non-ASCII chars.

For example, what if someone else has encoded "Françoise Paré" the right way, and when I decode it I get the right string instead of "Fran\xE7oise Par\xE9" (which is what would be decoded if someone encoded it into ISO-8859-1).

[62] pry(main)> "Françoise Paré".encode("utf-8").valid_encoding?
=> true

[63] pry(main)> "Françoise Paré".encode("iso-8859-1")
=> "Fran\xE7oise Par\xE9"

# Note the encoding is still valid, it's just the way IRB displays
# ISO-8859-1

[64] pry(main)> "Françoise Paré".encode("iso-8859-1").valid_encoding?
=> true

# Now let's interpret our 8859 string as UTF-8. In the following
# line, the string bytes don't change, `force_encoding` just makes
# Ruby interpret those same bytes as UTF-8.

[65] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8")
=> "Fran\xE7oise Par\xE9"

# Is a lone \xE7 valid UTF-8? Nope.

[66] pry(main)> "Françoise Paré".encode("iso-8859-1").force_encoding("utf-8").valid_encoding?
=> false

answered Sep 23 '22 10:09

Jonathan Allard

Related questions
                            
                                How to fix ActionDispatch::Cookies::CookieOverflow error on heroku?
                            
                                Cannot override to_s in irb
                            
                                Curses array browsing with keyboard in Ruby
                            
                                Get background image with Nokogiri from DOM?
                            
                                Replace word using gsub function in ruby
                            
                                Calling a module from another file
                            
                                NoMethodError in Rails::MailersController#preview undefined method `activation_token=' for nil:NilClass
                            
                                Refresh page after deleting
                            
                                Ruby FileUtils mkdir_p mode - unexpected result
                            
                                Check if a string includes any of the keys in a hash and return the value of the key it contains
                            
                                Ruby Sinatra configured to work on production and development
                            
                                How do I add named parameters in a subclass or change their default in Ruby 2.2?
                            
                                Hash "has_key" complexity in Ruby
                            
                                Building gem, executable not found
                            
                                Sort a hash by value in descending order and then key in ascending order ruby
                            
                                remote: ! Precompiling assets failed. (Heroku)
                            
                                Creating directory over SFTP on Ruby fails if directory exists already
                            
                                Ruby countdown timer
                            
                                rails 4 - require class in initializer or module that uses it - best practice
                            
                                Continue multi-host tests even on failure

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With