I am using Spreadsheet::ParseExcel
to parse an Excel spreadsheet file as follows
my $FileName = "../excel.xls";
my $parser = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse($FileName);
And reading values from the cells like this
$product = $worksheeto->get_cell( $row, 0 )->value();
The problem is that, when there is a French character, for instance à
, it shows ò
To be sure that the there is no error in the parsing I used
print unpack('H*', $product) . "\n";
So when I use any online hex to string converter I do get the à
.
I also tried
use utf8;
binmode(STDOUT, ":utf8");
but instead of à
I get +á
Is there a way to get the correct characters?
Try parsing the file with a formatter, for example the Spreadsheet::ParseExcel::FmtUnicode:
use Spreadsheet::ParseExcel;
use Spreadsheet::ParseExcel::FmtUnicode;
#use Spreadsheet::ParseExcel::FmtJapan;
my $FileName = '../excel.xls';
my $parser = Spreadsheet::ParseExcel->new();
my $formatter = Spreadsheet::ParseExcel::FmtUnicode->new();
my $workbook = $parser->parse($FileName,$formatter);
Try also the FmtJapan, since the documentation says: The Spreadsheet::ParseExcel::FmtJapan formatter also supports Unicode. If you encounter any encoding problems with the default formatter try that instead.
*UPDATE:
I tried it by myself in a xls file with Greek characters but it didn't worked neither with FmtUnicode or FmtJapan . I then found this perlmonks post, used the provided My::Excel::FmtUTF8
module and worked successfully when printing the values of a cell with $cell->value()
.
I have tried what you describe and this works correctly here, once I enable the utf-8 output. I would guess you either have a weird excel file (you should post an example somewhere), or that your terminal is badly configured.
Dealing with character set issues is hard, because your terminal can me confusing you. So it is always a good idea to pipe the output into 'od -c' to see what you are getting. In my script I get this text from a spreadsheet I had lying around:
Value = Descripción
And when I pipe it through od:
0000000 V a l u e = D e
0000020 s c r i p c i 303 263 n \n
I can see that the ó is two bytes long, which suggests is UTF-8. To make sure, you can ask iconv to convert from the expected output charset to whatever you are using in your terminal:
iconv -f utf-8
If the input is not proper utf-8 it will bark at you and/or output even weirder garbage.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With