Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to find out if csv file fields are tab delimited or comma delimited

Tags:

php

how to find out if csv file fields are tab delimited or comma delimited. I need php validation for this. Can anyone plz help. Thanks in advance.

like image 535
SowmyAnil Avatar asked Aug 03 '10 09:08

SowmyAnil


4 Answers

It's too late to answer this question but hope it will help someone.

Here's a simple function that will return a delimiter of a file.

function getFileDelimiter($file, $checkLines = 2){
        $file = new SplFileObject($file);
        $delimiters = array(
          ',',
          '\t',
          ';',
          '|',
          ':'
        );
        $results = array();
        $i = 0;
         while($file->valid() && $i <= $checkLines){
            $line = $file->fgets();
            foreach ($delimiters as $delimiter){
                $regExp = '/['.$delimiter.']/';
                $fields = preg_split($regExp, $line);
                if(count($fields) > 1){
                    if(!empty($results[$delimiter])){
                        $results[$delimiter]++;
                    } else {
                        $results[$delimiter] = 1;
                    }   
                }
            }
           $i++;
        }
        $results = array_keys($results, max($results));
        return $results[0];
    }

Use this function as shown below:

$delimiter = getFileDelimiter('abc.csv'); //Check 2 lines to determine the delimiter
$delimiter = getFileDelimiter('abc.csv', 5); //Check 5 lines to determine the delimiter

P.S I have used preg_split() instead of explode() because explode('\t', $value) won't give proper results.

UPDATE: Thanks for @RichardEB pointing out a bug in the code. I have updated this now.

like image 107
Jay Bhatt Avatar answered Nov 15 '22 10:11

Jay Bhatt


Here's what I do.

  1. Parse the first 5 lines of a CSV file
  2. Count the number of delimiters [commas, tabs, semicolons and colons] in each line
  3. Compare the number of delimiters in each line. If you have a properly formatted CSV, then one of the delimiter counts will match in each row.

This will not work 100% of the time, but it is a decent starting point. At minimum, it will reduce the number of possible delimiters (making it easier for your users to select the correct delimiter).

/* Rearrange this array to change the search priority of delimiters */
$delimiters = array('tab'       => "\t",
                'comma'     => ",",
                'semicolon' => ";"
                );

$handle = file( $file );    # Grabs the CSV file, loads into array

$line = array();            # Stores the count of delimiters in each row

$valid_delimiter = array(); # Stores Valid Delimiters

# Count the number of Delimiters in Each Row
for ( $i = 1; $i < 6; $i++ ){
foreach ( $delimiters as $key => $value ){
    $line[$key][$i] = count( explode( $value, $handle[$i] ) ) - 1;
}
}


# Compare the Count of Delimiters in Each line
foreach ( $line as $delimiter => $count ){

# Check that the first two values are not 0
if ( $count[1] > 0 and $count[2] > 0 ){
    $match = true;

    $prev_value = '';
    foreach ( $count as $value ){

        if ( $prev_value != '' )
            $match = ( $prev_value == $value and $match == true ) ? true : false;

        $prev_value = $value;
    }

} else { 
    $match = false;
}

if ( $match == true )    $valid_delimiter[] = $delimiter;

}//foreach

# Set Default delimiter to comma
$delimiter = ( $valid_delimiter[0] != '' ) ? $valid_delimiter[0] : "comma";


/*  !!!! This is good enough for my needs since I have the priority set to "tab"
!!!! but you will want to have to user select from the delimiters in $valid_delimiter
!!!! if multiple dilimiter counts match
*/

# The Delimiter for the CSV
echo $delimiters[$delimiter]; 
like image 28
Dream Ideation Avatar answered Nov 15 '22 09:11

Dream Ideation


There is no 100% reliable way to detemine this. What you can do is

  • If you have a method to validate the fields you read, try to read a few fields using either separator and validate against your method. If it breaks, use another one.
  • Count the occurrence of tabs or commas in the file. Usually one is significantly higher than the other
  • Last but not least: Ask the user, and allow him to override your guesses.
like image 11
relet Avatar answered Nov 15 '22 11:11

relet


I'm just counting the occurrences of the different delimiters in the CSV file, the one with the most should probably be the correct delimiter:

//The delimiters array to look through
$delimiters = array(
    'semicolon' => ";",
    'tab'       => "\t",
    'comma'     => ",",
);

//Load the csv file into a string
$csv = file_get_contents($file);
foreach ($delimiters as $key => $delim) {
    $res[$key] = substr_count($csv, $delim);
}

//reverse sort the values, so the [0] element has the most occured delimiter
arsort($res);

reset($res);
$first_key = key($res);

return $delimiters[$first_key]; 
like image 7
Thomas Lang Avatar answered Nov 15 '22 09:11

Thomas Lang