Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I open a Unicode file with Perl?

I'm using osql to run several sql scripts against a database and then I need to look at the results file to check if any errors occurred. The problem is that Perl doesn't seem to like the fact that the results files are Unicode.

I wrote a little test script to test it and the output comes out all warbled:

$file = shift;

open OUTPUT, $file or die "Can't open $file: $!\n";
while (<OUTPUT>) {
    print $_;
    if (/Invalid|invalid|Cannot|cannot/) {
        push(@invalids, $file);
        print "invalid file - $inputfile - schedule for retry\n";
        last;
    }            
}

Any ideas? I've tried decoding using decode_utf8 but it makes no difference. I've also tried to set the encoding when opening the file.

I think the problem might be that osql puts the result file in UTF-16 format, but I'm not sure. When I open the file in textpad it just tells me 'Unicode'.

Edit: Using perl v5.8.8 Edit: Hex dump:

file name: Admin_CI.User.sql.results
mime type: 

0000-0010:  ff fe 31 00-3e 00 20 00-32 00 3e 00-20 00 4d 00  ..1.>... 2.>...M.
0000-0020:  73 00 67 00-20 00 31 00-35 00 30 00-30 00 37 00  s.g...1. 5.0.0.7.
0000-0030:  2c 00 20 00-4c 00 65 00-76 00 65 00-6c 00 20 00  ,...L.e. v.e.l...
0000-0032:  31 00                                            1.
like image 926
Jaco Pretorius Avatar asked Mar 17 '10 11:03

Jaco Pretorius


1 Answers

The answer is in the documentation for open, which also points you to perluniintro. :)

open my $fh, '<:encoding(UTF-16LE)', $file or die ...;

You can get a list of the names of the encodings that your perl supports:

% perl -MEncode -le "print for Encode->encodings(':all')"

After that, it's up to you to find out what the file encoding is. This is the same way you'd open any file with an encoding different than the default, whether it's one defined by Unicode or not.

We have a chapter in Effective Perl Programming that goes through the details.

like image 186
brian d foy Avatar answered Oct 31 '22 22:10

brian d foy