I'm attempting to parse a set of CSV data using PHP, but having a major issue. One of the fields is a long description field, which itself contains linebreaks within the enclosures.
My primary issue is writing a piece of code that can split the data line by line, but also recognize when linebreaks within the data should not be used. The linebreaks within this field are not properly escaped, making them hard to distinguish from legitimate linebreaks.
I've tried to come up with a regular expression that can properly handle it, but had no luck so far. Any ideas?
CSV format:
"####","text data here", "text data \n with linebreaks \n here"\n
"####","more text data", "more data \n with \n linebreaks \n here"\n
Press and hold the Alt key and then enter “010” from your keyboard's 10-keypad part. (Note: The numbers in the top row of the keyboard won't work.) If '010' doesn't work try '013' because the data imported from a different source might have line breaks represented by a different code.
Adding "sep=;" or "sep=," to the CSV file Here are the steps you should follow: Open your CSV using a text editor. Skip a line at the top, and add sep=; if the separator used in the CSV is a semicolon (;), or sep=, if the separator is a comma (,). Save, and re-open the file.
According to aleske, a commenter in the documentation for PHP's fgetcsv function:
The PHP's CSV handling stuff is non-standard and contradicts with RFC4180, thus fgetcsv() cannot properly deal with files [that contain line breaks] ...
And he offered up the following function to get around this limitation:
function csvstring_to_array(&$string, $CSV_SEPARATOR = ';', $CSV_ENCLOSURE = '"', $CSV_LINEBREAK = "\n") {
$o = array();
$cnt = strlen($string);
$esc = false;
$escesc = false;
$num = 0;
$i = 0;
while ($i < $cnt) {
$s = $string[$i];
if ($s == $CSV_LINEBREAK) {
if ($esc) {
$o[$num] .= $s;
} else {
$i++;
break;
}
} elseif ($s == $CSV_SEPARATOR) {
if ($esc) {
$o[$num] .= $s;
} else {
$num++;
$esc = false;
$escesc = false;
}
} elseif ($s == $CSV_ENCLOSURE) {
if ($escesc) {
$o[$num] .= $CSV_ENCLOSURE;
$escesc = false;
}
if ($esc) {
$esc = false;
$escesc = true;
} else {
$esc = true;
$escesc = false;
}
} else {
if ($escesc) {
$o[$num] .= $CSV_ENCLOSURE;
$escesc = false;
}
$o[$num] .= $s;
}
$i++;
}
// $string = substr($string, $i);
return $o;
}
That looks like it will do the trick.
I found that you can use a normal CSV parser after you convert the CSV to unix format.
Here is a function that did the trick for me .
function dos2unix($s) {
$s = str_replace("\r\n", "\n", $s);
$s = str_replace("\r", "\n", $s);
$s = preg_replace("/\n{2,}/", "\n\n", $s);
return $s;
}
And a parsing function
function csvstring_to_array($string, $separatorChar = ',', $enclosureChar = '"', $newlineChar = PHP_EOL) {
// @author: Klemen Nagode
$string = dos2unix($string);
$array = array();
$size = strlen($string);
$columnIndex = 0;
$rowIndex = 0;
$fieldValue="";
$isEnclosured = false;
for($i=0; $i<$size;$i++) {
$char = $string{$i};
$addChar = "";
if($isEnclosured) {
if($char==$enclosureChar) {
if($i+1<$size && $string{$i+1}==$enclosureChar){
// escaped char
$addChar=$char;
$i++; // dont check next char
}else{
$isEnclosured = false;
}
}else {
$addChar=$char;
}
}else {
if($char==$enclosureChar) {
$isEnclosured = true;
}else {
if($char==$separatorChar) {
$array[$rowIndex][$columnIndex] = $fieldValue;
$fieldValue="";
$columnIndex++;
}elseif($char==$newlineChar) {
echo $char;
$array[$rowIndex][$columnIndex] = $fieldValue;
$fieldValue="";
$columnIndex=0;
$rowIndex++;
}else {
$addChar=$char;
}
}
}
if($addChar!=""){
$fieldValue.=$addChar;
}
}
if($fieldValue) { // save last field
$array[$rowIndex][$columnIndex] = $fieldValue;
}
return $array;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With