Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to validate csv file?

Tags:

php

csv

How can we validate a CSV file ?

I have an CSV file of structure:

Date;Id;Shown
15-Mar-10;231;345
15-Mar-10;232;346
and so on and on !!! approx around 80,000 rows. 

How can I validate this CSV file before starting the parsing using fgetcsv ?

like image 206
Rachel Avatar asked Mar 15 '10 20:03

Rachel


People also ask

How do I validate a CSV file in node JS?

if we want to validate each row of csv file we can validate for example. var csvStream = fs. createReadStream("Sample. csv"); csv .

How do I view a CSV file online?

Just enter the location of the file you want to check, or upload it. If you have a schema which describes the contents of the CSV file, you can also give its URL or upload it. CSVLint currently only supports validation of delimiter-separated values (dsv) files.

How do I validate a CSV file in Python?

I.e., if you want to validate data from a CSV file, you have to first construct a CSV reader using the standard Python csv module, specifying the appropriate dialect, and then pass the CSV reader as the source of data to either the CSVValidator. validate or the CSVValidator. ivalidate method.

What makes a valid CSV file?

A CSV file must contain every column for all attributes of its object type. If the attribute value is not required, a column must exist with the attribute name and a comma for any blank attribute values. If a column is defined as optional, it means that the column is required to exist, but the value can be blank.


1 Answers

I would not try to validate the file before hand : I would rather prefer going through it line by line, dealing with each line separately :

  • Reading one line
  • Verifying it's OK
  • using the data
  • and going to next line.


Now, what could "verify it's OK" means ?

  • At least : make sure I can read the line as CSV, with my normal set of functions (maybe fgetcsv, maybe some other function specific to my project -- anyway, if I cannot read one line with my function that reads hundreds, it's probably because there's a problem on that line)
  • Then, check for the number of fields
  • then, for each field, check if it contains "valid" data
    • mandatory ? optionnal ?
    • numeric ?
    • string ?
    • date ?
    • and so on
  • then, for each field, some more careful checks
    • for instance, for a "code" field : does it correspond to a value that's legal for my application ?

If all that goes OK -- well, not much more to do, excepts use the data ;-)
And when you're done with one line, just go repeat for the next one.


Of course, if you want to either accept or reject a whole file before doing any database (or anything like that) write, you'll have to :

  • parse the file, line by line, applying the "verifying" ideas
  • store the data of each line in memory
  • and, when the whole file has been read to memory,
    • either start using the data
    • or, if there's been an error on one line, reject everything.


In your specific case, you have three kind of fields :

Date;Id;Shown
15-Mar-10;231;345
15-Mar-10;232;346

From what I can guess :

  • The first one must be a date
    • Using some regex to validate that will not be easy : there are not the same number of days each month, there are many months, there is not the same number of days in february depending on the year, ...
    • In such a case, I would probably try to parse the date with something like strtotime (not sure it's ok for the format you're using, though)
    • Or I would just explode the string
      • making sure there are three parts
      • that the third one is 2 digits
      • that the second one is one of Jan, Feb, Mar, ...
      • That the first one corresponds to the correct number of days, depending on the two others
  • The second one :
    • must be an integer
    • must be a valid value, that exists in your database ?
      • If so, a simple SQL query will allow you to check that
  • For the third one, not really sure...
    • I'm guessing it has to be an integer ?
like image 63
Pascal MARTIN Avatar answered Sep 20 '22 03:09

Pascal MARTIN