Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert Unicode file to ASCII file in perl script on windows machine

I have a file in Unicode format on a windows machine. Is there any way to convert it to ASCII format on a windows machine using perl script

It's UTF-16 BOM.

like image 319
ashokbabuy Avatar asked Dec 13 '22 08:12

ashokbabuy


2 Answers

If you want to convert unicode to ascii, you must be aware that some characters can't be converted, because they just don't exist in ascii. If you can live with that, you can try this:

#!/usr/bin/env perl
use strict;
use warnings;
use autodie;

use open IN => ':encoding(UTF-16)';
use open OUT => ':encoding(ascii)';

my $buffer;

open(my $ifh, '<', 'utf16bom.txt');
read($ifh, $buffer, -s $ifh);
close($ifh);

open(my $ofh, '>', 'ascii.txt');
print($ofh $buffer);
close($ofh);

If you do not have autodie, just remove that line - you should then change your open/close statements with a

open(...) or die "error: $!\n";

If you have characters that can't be converted, you will get warnings on the console and your output file will have e.g. text like

\x{00e4}\x{00f6}\x{00fc}\x{00df}

in it. BTW: If you don't have a mom but know it is Big Endian (Little Endian), you can change the encoding line to

use open IN => ':encoding(UTF-16BE)';

or

use open IN => ':encoding(UTF-16LE)';

Hope it works under Windows as well. I can't give it a try right now.

like image 148
Karsten S. Avatar answered Dec 15 '22 00:12

Karsten S.


Take a look at the encoding option on the Perl open command. You can specify the encoding when opening a file for reading or writing:

It'd be something like this would work:

#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say switch);
use Data::Dumper;

use autodie;

open (my $utf16_fh, "<:encoding(UTF-16BE)", "test.utf16.txt");
open (my $ascii_fh, ">:encoding(ASCII)", ".gvimrc");

while (my $line = <$utf16_fh>) {
    print $ascii_fh $line;
}

close $utf16_fh;
close $ascii_fh;
like image 32
David W. Avatar answered Dec 14 '22 22:12

David W.