Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Invalid byte sequence in UTF-8 (ArgumentError)

I'm trying to run a Ruby script, and always getting an error on this line:

file_content.gsub(/dr/i,'med')

Where I'm trying to replace "dr" by "med".

The error is:

program.rb:4:in `gsub': invalid byte sequence in UTF-8 (ArgumentError)

Why is that, how can I fix this issue?

I'm working on a MAC OS X Yosemite machine, with Ruby 2.2.1p85.

like image 766
Simplicity Avatar asked Apr 26 '15 11:04

Simplicity


People also ask

What is invalid byte sequence in UTF-8?

Explanation: This error occurs when you send text data, but either the source encoding doesn't match that currently set on the database, or the text stream contains binary data like NUL bytes that are not allowed within a string.

What UTF-8 means?

UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.


1 Answers

Probably your string is not in UTF-8 format, so use

if ! file_content.valid_encoding?
  s = file_content.encode("UTF-16be", :invalid=>:replace, :replace=>"?").encode('UTF-8')
  s.gsub(/dr/i,'med')
end

See "Ruby 2.0.0 String#Match ArgumentError: invalid byte sequence in UTF-8".

like image 172
jon snow Avatar answered Sep 30 '22 18:09

jon snow