Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

php fgetcsv and integer conversion (UTF8 x UCS-2 Little Endian)

Tags:

php

fgetcsv

UPDATE 4:

I've found (as suspected) that the files enconding is responsible for this. I converted the csv file from UCS-2 Little Endian to UTF-8 and everything works fine. Anyway I need to find some way to make PHP understands the input file correctly. Tried already sellocale without success.

--- original question ---

I have some data stored on a csv file in 'table' format and I want to store this data as (tag, tagtimestamp, tagvalue) on a database. But can't find a way to cast values stored on $v array to integer values:

$p=0;
$tag=array();
$dt=array();
$v=array();

$handle=fopen("sga/2013.10.054.vol2.csv", "r");

while(($row=fgetcsv($handle, 0, "\t"))!==FALSE){
    $num=count($row);
    $num--; // only even collumns
    if($p==0){
        for($c=1; $c<$num; $c+=2){
            array_push($tag, $row[$c]);
        }
    }else{
        array_push($dt, $row[0]);
        for($c=1; $c<$num;$c+=2){
            array_push($v, $row[$c]); // <<< here is my problem
        }
    }
    $p++;
}
fclose($handle);

echo "<pre>";
print_r($v);
echo "</pre>";

echoes:

Array(
[0] => 8701
[1] => 5281
[2] => 4341
[3] => 4360
[4] => 8701
[5] => 8239
[6] => 4631
[7] => 4115
[8] => 4123
[9] => 8239
[10] => 8409
[11] => 3978
[12] => 4192
[13] => 4216
[14] => 8409
[15] => 8916
[16] => 3325
[17] => 4444
[18] => 4472
[19] => 8916
[20] => 9550
[21] => 3286
[22] => 4763
[23] => 4789

...

tried to use

(int)$row[$c], 
intval($row[$c]), 
($row[$c]+0), 
settype($row[$c], "integer")
preg_replace('!\s+!', $row[$c]) // to clean up any messy char...

, etc...

I do interpolate values using:

$nv=0;
$sqls="";
foreach ($tag as $kt=>$t){
    foreach($dt as $kd=>$d){
        $f=($kd*5)+$kt;
        $de=explode(" ", $d);
        $d=implode("-", array_reverse(explode("/", $de[0])))." ".$de[1];
        $nv=$v[$f]; // <<<<< here the values must become integers but only returns zeroes
        $sqls.="('$t', '$d', $nv), ";
    }
}

// the final query is setted here

    $sql="";
$sql="INSERT IGNORE INTO aducao (tag, tagtime, tagvalue) VALUES "; 
$sqls=rtrim($sqls, ", ");
$sql.=$sqls;
echo $sql;

My file is UTF.8 without BOM formated.

and CSV file is tab (\t) separated.

Anyway if I left the "values" as it comes from csv my mysql database throws an error, and if I try to convert the values becomes zeroes.

UPDATE:

here's the mysql error if values is not converted. I guess that because if I copy and paste the sql string directly on mysql query browser the error does not occur, so some type of 'dirt' is into these values...

You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 1

UPDATE 2:

var_dump($v) shows:

array(3725) { [0]=> string(9) "8701" [1]=> string(9) "5281" [2]=> string(9) "4341" [3]=> string(9) "4360" [4]=> string(9) "8701" [5]=> string(9) "8239" [6]=> string(9) "4631" [7]=> string(9) "4115" [8]=> string(9) "4123" [9]=> string(9) "8239" [10]=> string(9) "8409" [11]=> string(9) "3978" [12]=> string(9) "4192" [13]=> string(9) "4216" [14]=> string(9) "8409" [15]=> string(9) "8916" [16]=> string(9) "3325" [17]=> string(9) "4444" [18]=> string(9) "4472" [19]=> string(9) "8916" [20]=> string(9) "9550" [21]=> string(9) "3286" [22]=> string(9) "4763" [23]=> string(9) "4789" [24]=> string(9) "9550" [25]=> string(11) "11506" [26]=> string(9) "3448" [27]=> string(9) "5738" [28]=> string(9) "5769" [29]=> string(11) "11506" [30]=> string(11) "11287" [31]=> string(9) "3690" [32]=> string(9) "5630" [33]=> string(9) "5656" [34]=> string(11) "11287" [35]=> string(11) "10639" [36]=> string(9) "4329" [37]=> string(9) "5312" [38]=> string(9) "5327" [39]=> string(11) "10639" [40]=> string(11) "11427" [41]=> string(9) "4973" [42]=> string(9) "5704" [43]=> string(9) "5721" [44]=> string(11) "11427" [45]=> string(11) "11688" [46]=> string(9) "5605" [47]=> string(9) "5841" [48]=> string(9) "5847" 

UPDATE 3:

part of $sql output

INSERT IGNORE INTO aducao (tag, tagtime, tagvalue) VALUES ('E054_VOL', '2013-10-01 00:00:00', 8701), ('E054_VOL', '2013-10-01 01:00:00', 8239), ('E054_VOL', '2013-10-01 02:00:00', 8409), ('E054_VOL', '2013-10-01 03:00:00', 8916), ('E054_VOL', '2013-10-01 04:00:00', 9550), ('E054_VOL', '2013-10-01 05:00:00', 11506), ('E054_VOL', '2013-10-01 06:00:00', 11287), ('E054_VOL', '2013-10-01 07:00:00', 10639), ('E054_VOL', '2013-10-01 08:00:00', 11427), ('E054_VOL', '2013-10-01 09:00:00', 11688), ('E054_VOL', '2013-10-01 10:00:00', 7973), ('E054_VOL', '2013-10-01 11:00:00', 2067), ('E054_VOL', '2013-10-01 12:00:00', 0), ('E054_VOL', '2013-10-01 13:00:00', 0), ('E054_VOL', '2013-10-01 14:00:00', 0), ('E054_VOL', '2013-10-01 15:00:00', 0), ('E054_VOL', '2013-10-01 16:00:00', 0), ('E054_VOL', '2013-10-01 17:00:00', 137), ('E054_VOL', '2013-10-01 18:00:00', 142), ('E054_VOL', '2013-10-01 19:00:00', 140), ('E054_VOL', '2013-10-01 20:00:00', 169), ('E054_VOL', '2013-10-01 21:00:00', 0), ('E054_VOL', '2013-10-01 22:00:00', 161), ('E054_VOL', '2013-10-01 23:00:00', 9275), ('E054_VOL', '2013-10-02 00:00:00', 7679), ('E054_VOL', '2013-10-02 01:00:00', 8996), ('E054_VOL', '2013-10-02 02:00:00', 9135), ('E054_VOL', '2013-10-02 03:00:00', 9569), ('E054_VOL', '2013-10-02 04:00:00', 10114), ('E054_VOL', '2013-10-02 05:00:00', 10612), ('E054_VOL', '2013-10-02 06:00:00', 10438), ('E054_VOL', '2013-10-02 07:00:00', 10945), ('E054_VOL', '2013-10-02 08:00:00', 11784), ('E054_VOL', '2013-10-02 09:00:00', 12355), ('E054_VOL', '2013-10-02 10:00:00', 12500), ('E054_VOL', '2013-10-02 11:00:00', 12045), ('E054_VOL', '2013-10-02 12:00:00', 12089), ('E054_VOL', '2013-10-02 13:00:00', 12413), ('E054_VOL', '2013-10-02 14:00:00', 13024), ('E054_VOL', '2013-10-02 15:00:00', 13444), ('E054_VOL', '2013-10-02 16:00:00', 12244), ('E054_VOL', '2013-10-02 17:00:00', 10139), ('E054_VOL', '2013-10-02 18:00:00', 8586), ('E054_VOL', '2013-10-02 19:00:00', 8146), ('E054_VOL', '2013-10-02 20:00:00', 8418), ('E054_VOL', '2013-10-02 21:00:00', 8857), ('E054_VOL', '2013-10-02 22:00:00', 9142), ('E054_VOL', '2013-10-02 23:00:00', 9296), ('E054_VOL', '2013-10-03 00:00:00', 10015), ('E054_VOL', '2013-10-03 01:00:00', 9256), ('E054_VOL', '2013-10-03 02:00:00', 9452), ('E054_VOL', '2013-10-03 03:00:00', 11136), ('E054_VOL', '2013-10-03 04:00:00', 110)
like image 536
Paulo Bueno Avatar asked Nov 26 '13 19:11

Paulo Bueno


1 Answers

Looking at your var_dump output it looks like PHP has read some unprintable/invisible characters. That is why you see string(9) "8409", where you can clearly count that there are only 4 digits and not 9.

What I would do in your situation:

1) understand whether the values are correct.

2) use some hex viewer/editor and open the file to see what characters are actually there.

3) one of the solutions is probably just to remove all the unneeded (invisible) characters by a simple preg_replace('/[^0-9]+/', '', $value); and then insert them in a SQL query.

like image 69
Ingus Avatar answered Oct 19 '22 03:10

Ingus