I have seen multiple threads about what the best solution to auto detect the delimiter for an incoming CSV. Most of them are functions of length between 20 - 30 lines, multiple loops pre-determined list of delimiters, reading the first 5 lines and matching counts e.t.c e.t.c
Here is 1 example
I have just implemented this procedure, with a few modifications. Works brilliantly.
THEN I found the following code:
private function DetectDelimiter($fh)
{
$data_1 = null;
$data_2 = null;
$delimiter = self::$delim_list['comma'];
foreach(self::$delim_list as $key=>$value)
{
$data_1 = fgetcsv($fh, 4096, $value);
$delimiter = sizeof($data_1) > sizeof($data_2) ? $key : $delimiter;
$data_2 = $data_1;
}
$this->SetDelimiter($delimiter);
return $delimiter;
}
This to me looks like it's achieving the SAME results, where $delim_list is an array of delimiters as follows:
static protected $delim_list = array('tab'=>"\t",
'semicolon'=>";",
'pipe'=>"|",
'comma'=>",");
Can anyone shed any light as to why I shouldn't do it this simpler way, and why everywhere I look the more convoluted solution seems to be the accepted answer?
Thanks!
Here are the steps you should follow: Open your CSV using a text editor. Skip a line at the top, and add sep=; if the separator used in the CSV is a semicolon (;), or sep=, if the separator is a comma (,). Save, and re-open the file.
The fgetcsv() function parses a line from an open file, checking for CSV fields.
PHP has two inbuilt functions to read CSV file. fgetcsv() – Reads CSV using the reference of the file resource. str_getcsv() – Reads CSV data stored in a variable.
This function is elegant :)
/**
* @param string $csvFile Path to the CSV file
* @return string Delimiter
*/
public function detectDelimiter($csvFile)
{
$delimiters = [";" => 0, "," => 0, "\t" => 0, "|" => 0];
$handle = fopen($csvFile, "r");
$firstLine = fgets($handle);
fclose($handle);
foreach ($delimiters as $delimiter => &$count) {
$count = count(str_getcsv($firstLine, $delimiter));
}
return array_search(max($delimiters), $delimiters);
}
Fixed version.
In your code, if a string has more than 1 delimiter you'll get a wrong result (example: val; string, with comma;val2;val3). Also if a file has 1 row (count of rows < count of delimiters).
Here is a fixed variant:
private function detectDelimiter($fh)
{
$delimiters = ["\t", ";", "|", ","];
$data_1 = null; $data_2 = null;
$delimiter = $delimiters[0];
foreach($delimiters as $d) {
$data_1 = fgetcsv($fh, 4096, $d);
if(sizeof($data_1) > sizeof($data_2)) {
$delimiter = $d;
$data_2 = $data_1;
}
rewind($fh);
}
return $delimiter;
}
None of these worked for my use case. So I made some slight modifications.
/**
* @param string $filePath
* @param int $checkLines
* @return string
*/
public function getCsvDelimiter(string $filePath, int $checkLines = 3): string
{
$delimiters =[",", ";", "\t"];
$default =",";
$fileObject = new \SplFileObject($filePath);
$results = [];
$counter = 0;
while ($fileObject->valid() && $counter <= $checkLines) {
$line = $fileObject->fgets();
foreach ($delimiters as $delimiter) {
$fields = explode($delimiter, $line);
$totalFields = count($fields);
if ($totalFields > 1) {
if (!empty($results[$delimiter])) {
$results[$delimiter] += $totalFields;
} else {
$results[$delimiter] = $totalFields;
}
}
}
$counter++;
}
if (!empty($results)) {
$results = array_keys($results, max($results));
return $results[0];
}
return $default;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With