Suppose you have a string like <code>"€foo\xA0"</code>, encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get <code>"€foo"</code> ) In ruby-1.8 you could use <code>Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€foo\xA0")</code> but that is now deprecated. <code>"€foo\xA0".encode('UTF-8')</code> doesn't do anything, since it is already UTF-8. I tried: <pre class="prettyprint"><code>"€foo\xA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '') </code></pre> which yields <code>"foo"</code> But that also loses the valid multibyte character €

<pre class="prettyprint"><code>"€foo\xA0".encode('UTF-16le', invalid: :replace, replace: '').encode('UTF-8') </code></pre>

<pre class="prettyprint"><code>"€foo\xA0".chars.select(&:valid_encoding?).join </code></pre>

Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

Tags:

character-encoding

ruby

encoding

utf

ruby-1.9

Suppose you have a string like "€foo\xA0", encoded UTF-8, Is there a way to remove invalid byte sequences from this string? ( so you get "€foo" )

In ruby-1.8 you could use Iconv.iconv('UTF-8//IGNORE', 'UTF-8', "€foo\xA0") but that is now deprecated. "€foo\xA0".encode('UTF-8') doesn't do anything, since it is already UTF-8. I tried:

"€foo\xA0".force_encoding('BINARY').encode('UTF-8', :undef => :replace, :replace => '')

which yields

"foo"

But that also loses the valid multibyte character €

234

asked Jan 03 '12 09:01

StefanH

2 Answers

"€foo\xA0".encode('UTF-16le', invalid: :replace, replace: '').encode('UTF-8')

165

answered Sep 19 '22 19:09

Van der Hoorn

"€foo\xA0".chars.select(&:valid_encoding?).join

answered Sep 19 '22 19:09

Evgenii

Related questions
                            
                                Convert non-breaking spaces to spaces in Ruby
                            
                                Hidden field in rails form
                            
                                Getting the full RSpec test name from within a before(:each) block
                            
                                Get available diskspace in ruby
                            
                                How to require some lib files from anywhere
                            
                                Where can I store site-wide variables in Rails 4?
                            
                                Ruby path management
                            
                                How do you check if a library/ruby-gem has been loaded?
                            
                                rails 3.0.3 check if boolean value is true
                            
                                Sinatra - how do I get the server's domain name
                            
                                How to destroy Ruby object?
                            
                                Ruby: Object.to_a replacement
                            
                                Documentation for Psych to_yaml options?
                            
                                nokogiri failing to upgrade
                            
                                Ruby: Finding most recently modified file
                            
                                How to delete a non-empty directory using the Dir class?
                            
                                How to get a Date from date_select or select_date in Rails?
                            
                                Set maximum length in Text field in RoR
                            
                                Altering the primary key in Rails to be a string
                            
                                How to find the index of an array which has a maximum value

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With