Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fputcsv fails to escape entire cells containing a slashed double-quote: \"

Tags:

php

csv

I am building a CSV export tool. It's been working well for months, but we've recently encountered a couple of cases where there are extra lines caused by a cell "exploding".

I've narrowed the problem down to a bug in PHP (7.2.21 and others). I need to work around this. Below is the PHP script to reproduce the problem. It includes commas and linebreaks in each cell:

<?php
$data = array(
    array( 'ID', 'Name', 'Content', 'Date' ),
    array( 34, 'Radley', 'This is <strong>bold text</strong>, and' . "\r\n" . 'a second line, the first time', '2019-08-23' ),
    array( 47, 'John', 'This a <a href=\"http://example.org/\">link</a>' . "\r\n" . 'a second line, again', '2019-08-24' ),
    array( 65, 'Bob', 'This plain text, with no html, and lots of commas'. "\r\n" . 'and a third extra row', '2019-08-25' ),
);

$fh = fopen('php://output', 'w');
foreach( $data as $row ) fputcsv( $fh, $row );
echo stream_get_contents( $fh );
fclose( $fh );
?>

If we print_r the $data we can see this is only four values:

[2] => Array (
    (
        [0] => 47
        [1] => John
        [2] => This a <a href=\"http://example.org/\">link</a>
a second line, again
        [3] => 2019-08-24
    )
)

So I would expect this code to generate four rows with four columns each, but the third item by "John" is only 3 columns and adds an extra row with three columns as well.

badly formatted csv

The problem is not just that \" is already escaped. Having that in the string disables fputcsv from escaping EVERYTHING ELSE.

This malformed data can be seen in Excel for Windows and Mac, and also Google Sheets in Windows Chrome, and probably everything else. But if you load it back in to PHP with fgetcsv it works, so PHP must do this intentionally?


I need to correct this bug so that \" gets escaped properly into \"", and so all other commas and double-quotes get escaped as well. The plugin I am building exports content generated by other plugins, and those might have escaped data that needs to stay escaped with their own mechanisms.

Is there any way to ensure that each cell gets escaped, even if the cell has backslashes and quotes?

like image 867
Radley Sustaire Avatar asked Aug 24 '19 00:08

Radley Sustaire


1 Answers

The PHP manual stated that

If an enclosure character is contained in a field, it will be escaped by doubling it, unless it is immediately preceded by an escape_char.

And this is your case! you used fputcsv with it's default parameters, the " as the enclosure and \ as the escape_char, so the function left \" and didn't turn it to \""

###How to fix ? If you are using PHP >= 7.4.0 then just disable the proprietary escape mechanism. by supplying an empty string in the escape_char parameter

 fputcsv( $fh, $row, ',', '"' , '');

if you are before 7.4.0 then searching has shown me this workaround : pass "\0" as the escape_char parameter. (I tested it and it worked on your example)

 fputcsv( $fh, $row, ',', '"' , "\0");

This is the widely used hack for this problem according to my search, however Christoph M. Becker stated in his proposal "Kill CSV escaping" ( looks like you are not the only one who is angry with fputcsv CSV escaping :D )

While in many cases passing “\0” as $escape parameter will yield the desired results, this won't work if someone is writing/reading binary CSV files, may have issues with some non ASCII compatible encodings, and is generally to be regarded as a hack.

###What about the standards ?

RFC 4180 about CSV files has stated

If double-quotes are used to enclose fields, then a double-quote appearing inside a field must be escaped by preceding it with another double quote

There is nothing called escaping in standard CSV !! just a double quotes as the enclosure and escape it if it appears in the string with another double quotes. May be PHP wanted to support non-standard CSV files, I don't know!

After some searching and testing, it turns out this is a known issue with CSV files PHP functions fgetcsv and fputcsv. here are some other links I found during the search you might find it interesting:

What is wrong with csv in PHP

PHP RFC: Kill proprietary CSV escaping mechanism

Backslash on CSV is escaping the double quote but it should not

fgetcsv/fputcsv $escape parameter fundamentally broken

data gets garbled when writing to csv with fputcsv-fgetcsv

go home fputcsv youre drunk

like image 195
Accountant م Avatar answered Nov 19 '22 00:11

Accountant م