Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fgetcsv is eating the first letter of a String if it's an Umlaut

I am importing contents from an Excel-generated CSV-file into an XML document like:

$csv = fopen($csvfile, r);
$words = array();

while (($pair = fgetcsv($csv)) !== FALSE) {
    array_push($words, array('en' => $pair[0], 'de' => $pair[1]));
}

The inserted data are English/German expressions.

I insert these values into an XML structure and output the XML as following:

$dictionary = new SimpleXMLElement('<dictionary></dictionary>');
//do things
$dom = dom_import_simplexml($dictionary) -> ownerDocument;
$dom -> formatOutput = true;

header('Content-encoding: utf-8'); //<3 UTF-8
header('Content-type: text/xml'); //Headers set to correct mime-type for XML output!!!!

echo $dom -> saveXML();

This is working fine, yet I am encountering one really strange problem. When the first letter of a String is an Umlaut (like in Österreich or Ägypten) the character will be omitted, resulting in gypten or sterreich. If the Umlaut is in the middle of the String (Russische Föderation) it gets transferred correctly. Same goes for things like ß or é or whatever.

All files are UTF-8 encoded and served in UTF-8.

This seems rather strange and bug-like to me, yet maybe I am missing something, there's a lot of smart people around here.

like image 702
m90 Avatar asked Sep 12 '12 14:09

m90


People also ask

How to use fgetcsv in php?

PHP fgetcsv() Function$file = fopen("contacts. csv","r"); print_r(fgetcsv($file)); fclose($file);

How do I read a csv file in column wise in PHP?

You can open the file using fopen() as usual, get each line by using fgets() and then simply explode it on each comma like this: <? php $handle = @fopen("/tmp/inputfile. txt", "r"); if ($handle) { while (($buffer = fgets($handle)) !==


1 Answers

Ok, so this seems to be a bug in fgetcsv.

I am now processing the CSV data on my own (a little cumbersome), but it is working and I do not have any encoding issues at all.

This is (a not-yet-optimized version of) what I am doing:

$rawCSV = file_get_contents($csvfile);

$lines = preg_split ('/$\R?^/m', $rawCSV); //split on line breaks in all operating systems: http://stackoverflow.com/a/7498886/797194

foreach ($lines as $line) {
    array_push($words, getCSVValues($line));
}

The getCSVValues is coming from here and is needed to deal with CSV lines like this (commas!):

"I'm a string, what should I do when I need commas?",Howdy there

It looks like:

function getCSVValues($string, $separator=","){

    $elements = explode($separator, $string);

    for ($i = 0; $i < count($elements); $i++) {
        $nquotes = substr_count($elements[$i], '"');
        if ($nquotes %2 == 1) {
            for ($j = $i+1; $j < count($elements); $j++) {
                if (substr_count($elements[$j], '"') %2 == 1) { // Look for an odd-number of quotes
                    // Put the quoted string's pieces back together again
                    array_splice($elements, $i, $j-$i+1,
                        implode($separator, array_slice($elements, $i, $j-$i+1)));
                    break;
                }
            }
        }
        if ($nquotes > 0) {
            // Remove first and last quotes, then merge pairs of quotes
            $qstr =& $elements[$i];
            $qstr = substr_replace($qstr, '', strpos($qstr, '"'), 1);
            $qstr = substr_replace($qstr, '', strrpos($qstr, '"'), 1);
            $qstr = str_replace('""', '"', $qstr);
        }
    }
    return $elements;

}

Quite a bit of a workaround, but it seems to work fine.

EDIT:

There's a also a filed bug for this, apparently this depends on the locale settings.

like image 174
m90 Avatar answered Nov 14 '22 01:11

m90