Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I parse quoted CSV in Perl with a regex?

Tags:

regex

csv

perl

I'm having some issues with parsing CSV data with quotes. My main problem is with quotes within a field. In the following example lines 1 - 4 work correctly but 5,6 and 7 don't.

COLLOQ_TYPE,COLLOQ_NAME,COLLOQ_CODE,XDATA
S,"BELT,FAN",003541547,
S,"BELT V,FAN",000324244,
S,SHROUD SPRING SCREW,000868265,
S,"D" REL VALVE ASSY,000771881,
S,"YBELT,"V"",000323030,
S,"YBELT,'V'",000322933,

I'd like to avoid Text::CSV as it isn't installed on the target server. Realising that CSV's are are more complicated than they look I'm using a recipe from the Perl Cookbook.

sub parse_csv {
  my $text = shift; #record containg CSVs
  my @columns = ();
  push(@columns ,$+) while $text =~ m{
    # The first part groups the phrase inside quotes
    "([^\"\\]*(?:\\.[^\"\\]*)*)",?
      | ([^,]+),?
      | ,
    }gx;
  push(@columns ,undef) if substr($text, -1,1) eq ',';
  return @columns ; # list of vars that was comma separated.
}

Does anyone have a suggestion for improving the regex to handle the above cases?

like image 887
Mark Nold Avatar asked Mar 11 '09 07:03

Mark Nold


People also ask

Can CSV files have quotes?

Yes. You can import double quotation marks using CSV files and import maps by escaping the double quotation marks. To escape the double quotation marks, enclose them within another double quotation mark.

What is quoted CSV?

So quote characters are used in CSV files when the text within a field also includes a comma and could be confused as being the reserved comma delimiter for the next field. Quote characters indicate the start and end of a block of text where any comma characters can be ignored.

Can you parse CSV?

Programs store CSV files as simple text characters; a comma separates each data element, such as a name, phone number or dollar amount, from its neighbors. Because of CSV's simple format, you can parse these files with practically any programming language.


1 Answers

Please, Try Using CPAN

There's no reason you couldn't download a copy of Text::CSV, or any other non-XS based implementation of a CSV parser and install it in your local directory, or in a lib/ sub directory of your project so its installed along with your projects rollout.

If you can't store text files in your project, then I'm wondering how it is you are coding your project.

http://novosial.org/perl/life-with-cpan/non-root/

Should be a good guide on how to get these into a working state locally.

Not using CPAN is really a recipe for disaster.

Please consider this before trying to write your own CSV implementation.

Text::CSV is over a hundred lines of code, including fixed bugs and edge cases, and re-writing this from scratch will just make you learn how awful CSV can be the hard way.

note: I learnt this the hard way. Took me a full day to get a working CSV parser in PHP before I discovered an inbuilt one had been added in a later version. It really is something awful.

like image 167
Kent Fredric Avatar answered Sep 21 '22 15:09

Kent Fredric