Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to change deprecated iconv to String#encode for invalid UTF8 correction

I get sources from the web and sometimes the encoding of the material is not 100% UTF8 byte sequence valid. I use iconv to silently ignore these sequences to get a cleaned string.

@iconv = Iconv.new('UTF-8//IGNORE', 'UTF-8')
valid_string = @iconv.iconv(untrusted_string)

However now the iconv has been deprecated, I see its deprecation warning a lot.

iconv will be deprecated in the future, use String#encode

I tried the converting it, using String#encode's :invalid and :replace options, but it seems not to be working (i.e. the incorrect byte sequence has not been removed). What is the correct way to use String#encode for this?

like image 978
lulalala Avatar asked Dec 13 '22 05:12

lulalala


1 Answers

This has been answered in this question:

Is there a way in ruby 1.9 to remove invalid byte sequences from strings?

Use either

untrusted_string.chars.select{|i| i.valid_encoding?}.join

or

untrusted_string.encode('UTF-8', :invalid => :replace, :replace => '').encode('UTF-8')
like image 117
Martijn Avatar answered Dec 15 '22 01:12

Martijn