Ruby Invalid Byte Sequence in UTF-8

Tags:

I have the following code, which gives me an invalid byte sequence error pointing to the scan method in initialize. Any ideas on how to fix this? For what it's worth, the error does not occur when the (.*) between the h1 tag and the closing > is not there.

#!/usr/bin/env ruby  class NewsParser    def initialize       Dir.glob("./**/index.htm") do |file|         @file = IO.read file          parsed = @file.scan(/<h1(.*)>(.*?)<\/h1>(.*)<!-- InstanceEndEditable -->/im)         self.write(parsed)       end   end    def write output     @contents = output     open('output.txt', 'a') do |f|        f << @contents[0][0]+"\n\n"+@contents[0][1]+"\n\n\n\n"      end   end  end  p = NewsParser.new

Edit: Here is the error message:

news_parser.rb:10:in 'scan': invalid byte sequence in UTF-8 (ArgumentError)

SOLVED: The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and encoding: UTF-8 solve the issue.

Thanks!

636

asked Mar 07 '12 19:03

redgem

1 Answers

The combination of using: @file = IO.read(file).force_encoding("ISO-8859-1").encode("utf-8", replace: nil) and #encoding: UTF-8 solved the issue.

125

answered Oct 21 '22 03:10

redgem

Related questions
                            
                                Connect to remote MySQL server with SSL from PHP
                            
                                How to link google protobuf libraries via cmake on linux?
                            
                                Saving gmon.out before killing a process
                            
                                Simple program to call R from Java using Eclipse and Rserve
                            
                                Common legend for multiple plots in R
                            
                                Java AES encryption and decryption
                            
                                JPanel which one of Listeners is proper for visibility is changed
                            
                                how I can change the voice synthesizer gender and age in C#?
                            
                                Adding inline many to many objects in Django admin
                            
                                Capture 404 status with jQuery AJAX
                            
                                StreamWriter writing to MemoryStream
                            
                                Using ProGuard with Android

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Ruby Invalid Byte Sequence in UTF-8

Tags:

redgem

People also ask

1 Answers

redgem

Recent Activity

Donate For Us