I see this a lot and haven't figured out a graceful solution. If user input contains invalid byte sequences, I need to be able to have it not raise an exception. For example: <pre class="prettyprint"><code># @raw_response comes from user and contains invalid UTF-8 # for example: @raw_response = "\xBF" regex.match(@raw_response) ArgumentError: invalid byte sequence in UTF-8 </code></pre> Numerous similar questions have been asked and the result appears to be encoding or force encoding the string. Neither of these work for me however: <pre class="prettyprint"><code>regex.match(@raw_response.force_encoding("UTF-8")) ArgumentError: invalid byte sequence in UTF-8 </code></pre> or <pre class="prettyprint"><code>regex.match(@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?")) ArgumentError: invalid byte sequence in UTF-8 </code></pre> Is this a bug with Ruby 2.0.0 or am I missing something? What is strange is it appear to be encoding correctly, but match continues to raise an exception: <pre class="prettyprint"><code>@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?").encoding => #<Encoding:UTF-8> </code></pre>

In Ruby 2.0 the <code>encode</code> method is a no-op when encoding a string to its current encoding: <blockquote> Please note that conversion from an encoding <code>enc</code> to the same encoding <code>enc</code> is a no-op, i.e. the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes. </blockquote> This changed in 2.1, which also added the <code>scrub</code> method as an easier way to do this. If you are unable to upgrade to 2.1, you’ll have to encode into a different encoding and back in order to remove invalid bytes, something like: <pre class="prettyprint"><code>if ! s.valid_encoding? s = s.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8') end </code></pre>

Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8

Tags:

ruby

ruby-on-rails

ruby-on-rails-4

ruby-2.0

I see this a lot and haven't figured out a graceful solution. If user input contains invalid byte sequences, I need to be able to have it not raise an exception. For example:

# @raw_response comes from user and contains invalid UTF-8 # for example: @raw_response = "\xBF"   regex.match(@raw_response) ArgumentError: invalid byte sequence in UTF-8

Numerous similar questions have been asked and the result appears to be encoding or force encoding the string. Neither of these work for me however:

regex.match(@raw_response.force_encoding("UTF-8")) ArgumentError: invalid byte sequence in UTF-8

regex.match(@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?")) ArgumentError: invalid byte sequence in UTF-8

Is this a bug with Ruby 2.0.0 or am I missing something?

What is strange is it appear to be encoding correctly, but match continues to raise an exception:

@raw_response.encode("UTF-8", :invalid=>:replace, :replace=>"?").encoding  => #<Encoding:UTF-8>

202

asked Jun 04 '14 11:06

Tom Rossi

1 Answers

In Ruby 2.0 the encode method is a no-op when encoding a string to its current encoding:

Please note that conversion from an encoding enc to the same encoding enc is a no-op, i.e. the receiver is returned without any changes, and no exceptions are raised, even if there are invalid bytes.

This changed in 2.1, which also added the scrub method as an easier way to do this.

If you are unable to upgrade to 2.1, you’ll have to encode into a different encoding and back in order to remove invalid bytes, something like:

if ! s.valid_encoding?   s = s.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8') end

122

answered Sep 23 '22 15:09

matt

Related questions
                            
                                Ruby to_json :methods arguments
                            
                                Query on Mongoid Hash Field
                            
                                How to charge a particular card on a customer with Stripe.com
                            
                                What is a good .gitignore to use with Rails on Heroku?
                            
                                Rails scope vs named_scope
                            
                                Rails change submit button text
                            
                                Rails 3: Display variable from controller in view
                            
                                rails json response with gzip compression
                            
                                Ruby on Rails 4: How to include Javascript files in Rails web application?
                            
                                Good Git deployment using branches strategy with Heroku?
                            
                                Hash inside YAML file?
                            
                                Association for polymorphic belongs_to of a particular type
                            
                                No route matches [GET] "/assets/bootstrap.css.map"
                            
                                How to get the subdomain value from a url?
                            
                                Rails 4 Accessing Join Table Attributes
                            
                                Using the after_save callback to modify the same object without triggering the callback again (recursion)
                            
                                How to deal with vendor/plugins after upgrading to rails 3.2.1
                            
                                Rails3 Routes - Passing parameter to a member route
                            
                                ActiveRecord appends 'AND (1=0)' to end of queries
                            
                                Rails Errno::EACCES Permission Denied

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With