Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding duplicate column values in a CSV

Tags:

php

csv

fgetcsv

I'm importing a CSV that has 3 columns, one of these columns could have duplicate records.

I have 2 things to check:

1. The field 'NAME' is not null and is a string
2. The field 'ID' is unique

So far, I'm parsing the CSV file, once and checking that 1. (NAME is valid), which if it fails, it simply breaks out of the while loop and stops.

I guess the question is, how I'd check that ID is unique?

I have fields like the following:

NAME,  ID,
Bob,   1,
Tom,   2,
James, 1,
Terry, 3,
Joe,   4,

This would output something like `Duplicate ID on line 3'

Thanks

P.S this CSV file has more columns and can have around 100,000 records. I have simplified it for a specific reason to solve the duplicate column/field

Thanks

like image 775
sipher_z Avatar asked Dec 01 '25 04:12

sipher_z


2 Answers

Give it a try:

    $row = 1;
    $totalIDs = array();
    if (($handle = fopen('/tmp/test1.csv', "r")) !== FALSE) 
    {
        while (($data = fgetcsv($handle)) !== FALSE) 
        {                           
            $name = '';
            
            if (isset($data[0]) && $data[0] != '')
            {
                $name = $data[0];
                if (is_numeric($data[0]) || !is_string($data[0]))
                    echo "Name is not a string for row $row\n";
            }
            else
            {
                echo "Name not set for row $row\n";     
            }
            
            $id = '';
            if (isset($data[1]))
            {
                $id = $data[1];                 
            }
            else
            {
                echo "ID not set for row $row\n";               
            }
            
            if (isset($totalIDs[$id])) {
                echo "Duplicate ID on line $row\n";
            }
            else {
                $totalIDs[$id] = 1;
            }
        
            $row++;
        }
        fclose($handle);
    }
like image 181
dcapilla Avatar answered Dec 02 '25 17:12

dcapilla


<?php
$cnt = 0;
$arr=array();
if (($handle = fopen("1.csv", "r")) !== FALSE) {
    while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
         $num=count($data);
         $cnt++;
         for ($c=0; $c < $num; $c++) {
           if(is_numeric($data[$c])){
                if (array_key_exists($data[$c], $arr)) 
                    $arrdup[] = "duplicate value at ".($cnt-1); 
                else
                    $arr[$data[$c]] = $data[$c-1];
            }   
        }
    }
    fclose($handle);
}
print_r($arrdup);
like image 40
sumit Avatar answered Dec 02 '25 19:12

sumit