Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to detect encoding of uploaded csv file

I`ve have data.csv file, that must be uploaded to server , parsed ....

This file can have different encodings. I must detect it and convert to utf8.

At this moment php function mb_detect_encoding always return utf8. i tryed:

<?php 
mb_detect_encoding(file_get_contents($_FILES["csv_uploadfile"]["tmp_name"]));

or

<?php 
mb_detect_encoding(file_get_contents($saved_file_path));

mb_detect_encoding returns utf8.

if i use bash command

$ file -bi csv_import_1378376486.csv |awk -F "=" '{print $2}'

it rerurns iso-8859-1

so when i try

iconv --from-code=iso-8859-1 --to-code=utf-8 csv_import_1378382527.csv 

it is not readable.

The real encoding is cp1251, by i cann`t detect it. Can anyone help me to solve this problem?

like image 961
Tony-M Avatar asked Sep 05 '13 12:09

Tony-M


1 Answers

As someone noticed in the PHP docs here:

If you try to use mb_detect_encoding() to detect whether a string is valid UTF-8, use the strict mode, it is pretty worthless otherwise.

So you should try using the true param when detecting encoding:

mb_detect_encoding($str, mb_detect_order(), TRUE);

If you can predict some possible encodings, you can list them instead of using mb_detect_order().

like image 182
Kleskowy Avatar answered Sep 28 '22 08:09

Kleskowy