Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

fgetcsv/fputcsv $escape parameter fundamentally broken

Tags:

php

csv

Overview

fgetcsv and fputcsv support an $escape argument, however, it's either broken, or I'm not understanding how it's supposed to work. Ignore the fact that you don't see the $escape parameter documented on fputcsv, it is supported in the PHP source, there's a small bug preventing it from coming through in the documentation.

The function also supports $delimiter and $enclosure parameters, defaulting to a comma and a double quote respectively. I would expect the $escape parameter should be passed in order to have a field containing any one of those metacharacters (backslash, comma or double quote), however this certainly isn't the case. (I now understand from reading Wikipedia, these are to be enclosed in double-quotes).

What I've tried

Take for example the pitfall that has affected numerous posters in the comments section from the fgetcsv documentation. The case where we'd like to write a single backslash to a field.

$r = fopen('/tmp/test.csv', 'w');
fwrite($r, '"\"');
fclose($r);

$r = fopen('/tmp/test.csv', 'r');
var_dump(fgetcsv($r));
fclose($r);

This returns false. I've also tried "\\", however that also returns false. Padding the backslash(es) with some nebulous text gives fgetcsv the boost it needs... "hi\\there" and "hi\there" both parse and have the same result, but the result has only 1 backslash, so what's the point of the $escape at all?

I've observed the same behavior when not enclosing the backslash in double quotes. Writing a 'CSV' file containing the string \, and \\, have the same result when parsed by fgetcsv, 1 backslash.

Let's ask PHP how it might encode a backslash as a field in a CSV using fputcsv

$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, array('\\'));
fclose($r);
echo file_get_contents('/tmp/test.csv');

The result is a double-quote enclosed single backslash (and I've tried 3 versions of PHP > 5.5.4 when $enclose support was supposedly added to fputcsv). The hilarity of this is that fgetcsv can't even read it properly per my notes above, it returns false... I'd expect fputcsv not to enclose the backslash in double quotes or fgetcsv to be able to read "\" as fputcsv has written it..., or really in my apparently misconstrued mind, for fputcsv to write a double quote enclosed pair of backslashes and for fgetcsv to be able to properly parse it!

Reproducible Test

Try writing a single quote to a file using fputcsv, then reading it via fgetcsv.

$aBackslash = array('\\');

// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash);
fclose($r);

// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r);
fclose($r);

// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
  echo "PHP CSV support is broken\n";

Questions

Taking a step back I have some questions

  • What's the point of the $escape parameter?
  • Given the loose definition of CSV files, can it be said PHP is supporting them correctly?
  • What's the 'proper' way to encode a backslash in a CSV file?

Background

I initially discovered this when a co-worker provided me a CSV file produced from Python, which wrote out a single backslash enclosed by double quotes and after fgetcsv failed to read it. I had the gaul to ask him if he could use a standard Python function. Little did I know the PHP CSV toolkit is a tangled mess! (FWIW: the Python dev tells me he's using the CSV writing module).

like image 668
quickshiftin Avatar asked Nov 10 '14 09:11

quickshiftin


2 Answers

From a quick look at Python's documentation on CSV Format Parameters, the escape character used within enclosed values (i.e. inside double quotes) is another double quote.

For PHP, the default escape character is a backslash (^); to match Python's behaviour you need to use this:

$data = fgetcsv($r, 0, ',', '"', '"');

(^) Actually fgetcsv() treats both $enclosure||$enclosure and $escape||$enclosure in the same way, so the $escape argument is used to avoid treating the backslash as a special character.

(^^) Setting the $length parameter to 0 instead of a fixed hard limit makes it less efficient.

like image 108
Ja͢ck Avatar answered Nov 15 '22 19:11

Ja͢ck


EDIT 2

So after sleep and a relook at the code, turns out fputcsv doesn't accept the escape parameter, and I was being stupid. I've updated the code below to proper working code. The same basic principle applies, the escape parameter is there to alter the escape parameter so you can load a CSV with backslashes without them being treated as escape characters. The trick is to use a character that isn't contained within the csv. You can do this by grepping the file for a specific character, until you find one that isn't returned.

EDIT

Ok, so the verdict is that it checks for the escape char, and then never stops checking. So, if it finds it, it's escaped. That simple.

That said, the purpose of the escape parameter is to allow for this exact situation, where you can alter the escape char to a character that isn't needed.

Here I've converted your example code to a working code:

$aBackslash = array('\\');

// Write a single backslash to a file using fputcsv
$r = fopen('/tmp/test.csv', 'w');
fputcsv($r, $aBackslash, ',', '"'); // EDIT 2: Removed escape param that causes PHP Notice.
fclose($r);

// Read the file using fgetcsv
$r = fopen('/tmp/test.csv', 'r');
$aFgetcsv = fgetcsv($r, ',', '"', '#');
fclose($r);

// Compare the read value from fgetcsv to our original value
if(count(array_diff($aBackslash, $aFgetcsv)))
  echo "PHP CSV support is broken\n";
else
  echo "PHP WORKS!\n";

One important caveat is that both fgetcsv and fputcsv must have the same parameters, otherwise the returned array will not match up to the original array.

ORIGINAL ANSWER

You are very correct. This is a failing with the language. I've tried every permutation of slashes that I can think of, and I've yet to actually achieve a successful response from the CSV. It always returns just as your example says.

I think what @deceze was mention is that in your example you use array('\\') which is actually the string literal "\" which PHP interprets as such, and passes "\" to the CSV, which is then returned that way. This returns the erroneous response \", which, as I stated above, is definitely wrong.

I did manage to find a work around, so that the result is actually appropriate:

First, for your example we'll either need to generate /tmp/test.csv in with "\" as the body, or alter the array slightly. Easiest method is just changing the array to:

array('"\\\\"');

After that, we should change up the fgetcsv request a bit.

$aFgetcsv = fgetcsv($r);
$aFgetcsv = array_map('stripslashes', $aFgetcsv);

By doing this, we're telling PHP to strip the first slash, thus making the string within $aFgetcsv "\"

like image 28
Travis Weston Avatar answered Nov 15 '22 20:11

Travis Weston