ruby, `match': invalid byte sequence in UTF-8

Tags:

utf-8

I have some problem with UTF-8 conding. I have read some posts here but still it does not work properly somehow.

That is my code:

#!/bin/env ruby
#encoding: utf-8

def determine
  file=File.open("/home/lala.txt")          
  file.each do |line|           
    puts(line)
    type = line.match(/DOG/)
    puts('aaaaa')

    if type != nil 
      puts(type[0])
      break
    end        

  end
end

That are the first 3 lines of my file :

;?lalalalal60000065535-1362490443-0000006334-0000018467-0000000041en-lalalalallalalalalalalalaln Cell Generation
text/lalalalala1.0.0.1515
text/lalalala�DOG

When I run this code it shows me an error exactly when reading the third line of the file (where the word dog stands):

;?lalalalal60000065535-1362490443-0000006334-0000018467-0000000041en-lalalalallalalalalalalalaln Cell Generation
aaaaa

text/lalalalala1.0.0.1515
aaaaa

text/lalalala�DOG
/home/kik/Desktop/determine2.rb:16:in `match': invalid byte sequence in UTF-8 (ArgumentError)

BUT: if I run just a a determine function with the following content:

#!/bin/env ruby
#encoding: utf-8

    def determine
    type="text/lalalala�DOG".match(/DOG/)
    puts(type)
end

it works perfectly.

What is going wrong there? Thanks in advance!

EDIT: The third line in the file is:

text/lalalal»DOG

BUT when I print the thirf line of the file in ruby it shows up like:

text/lalalala�DOG

EDIT2:

This format was also developed to support localization. Strings stored within the file are stored as 2 byte UNICODE characters.The format of the file is a binary file with data stored in network byte order (big-endian format).

937

asked Mar 14 '13 01:03

2 Answers

I believe @Amadan is close, but has it backwards. I'd do this:

File.open("/home/lala.txt", "r:ASCII-8BIT")

The character is not valid UTF-8, but for your purposes, it looks like 8-bit ASCII will work fine. My understanding is that Ruby is using that encoding by default when you just use the string, which is why that works.

Update: Based on your most recent comment, it sounds like this is what you need:

File.open("/home/lala.txt", "rb:UTF-16BE")

answered Nov 15 '22 06:11

Darshan Rivka Whittle

Try using this:

File.open("/home/lala.txt", "r:UTF-8")

There seems to be an issue with wrong encoding being used at some stage. #encoding :utf specifies only the encoding of the source file, which affects how the literal string is interpreted, and has no effect on the encoding that File.open uses.

answered Nov 15 '22 04:11

Amadan

Related questions
                            
                                Training neural network in Ruby
                            
                                Redis-rb pushing an array via redis.lpush is flattening the list
                            
                                Dynamic Variables Jekyll Liquid
                            
                                ActiveAdmin filter boolean as single checkbox
                            
                                Nokogiri fails outputting XML with UTF-16 declaration (understanding and working around)
                            
                                Sinatra streaming response with headers
                            
                                YAML data exchange issues between Perl and Ruby
                            
                                cannot start sinatra process - eventmachine "no acceptor"
                            
                                Difference between runtime dynamic binding and class inheritance
                            
                                Ruby: Why does equals sign in literal regexp cause parsing error?
                            
                                Watir can't find elements I see in Chrome's DOM Inspector
                            
                                How to lock Resque jobs to one server
                            
                                Upsert in Mongoid
                            
                                How to net-ssh sudo su in Ruby
                            
                                Saving the stack?
                            
                                Ruby OptionParser: hiding help text for a command option
                            
                                Gem::InstallError: celluloid requires Ruby version >= 1.9.2
                            
                                Jekyll Compatibility with .erb
                            
                                Is there a program that translates Scheme/Java/Ruby code to english sentences? [closed]
                            
                                Having never written any automated tests, how should I start behaviour-driven development? [closed]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

ruby, `match': invalid byte sequence in UTF-8

Tags:

ruby

utf-8

Alina

People also ask

2 Answers

Darshan Rivka Whittle

Amadan

Recent Activity

Donate For Us