Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Processing CSV file without double quotes

In other words, I am looking for a way to ignore ", " in one of the fields.

The field should be treated as one single field even though it contains a comma.

Example:

Round,Winner,place,prize
1,xyz,1,$4,500

If I read this with dict reader $4,500 is printed as $4 because 500 is considered to be another field., This makes sense as I am reading the file as comma delimited, so I can't really complain but try to figure out a work around.

reader = csv.reader(f, delimiter=',', quotechar='"')

My source is not wrapped in double quotes so I can't ignore by including a quote string.

Is there any other way to handle this scenario? Probably something like define these dollar fields and make it ignore commas for that field? Or try to inserrt quotes around this field?

If not Python, could shell script or Perl be used to do it?

like image 216
jb04 Avatar asked Jan 05 '23 15:01

jb04


1 Answers

Perhaps pre-process the data to wrap all money in quotes, then process normally

$line =~ s/( \$\d+ (?:,\d{3})* (?:\.\d{2})? )/"$1"/gx;

The pattern matches digits following a $, optionally followed by any multiples of ,nnn and/or by one .nn. It also wraps $4.22 as well as $100, which I consider good for consistency. Restrict what gets matched if needed, for example to (\$\d{1,3},\d{3}). With fractional cents remove {2}. This doesn't cover all possible edge/broken cases.

The /g modifier makes it replace all such in the line and /x allows spaces for readibilty.

You can do it as a one-liner

perl -pe 's/(\$\d+(?:,\d{3})*(?:\.\d{2})?)/"$1"/g' input.csv  > changed.csv

Add -i switch to overwrite input ("in-place"), or -i.bak to also keep backup.


If you anticipate further need for tweaks, or to document this better, put it in a script

use warnings;
use strict;

my $file = '...';
my $fout = '...';

open my $fh,     '<', $file or die "Can't open $file: $!";
open my $fh_out, '>', $fout or die "Can't open $fout for writing: $!";

while (my $line = <$fh>) {
    $line =~ s/( \$\d+ (?:,\d{3})* (?:\.\d{2})? )/"$1"/gx;
    print fh_out $line;
}

close $fh;
close $fh_out;
like image 77
zdim Avatar answered Jan 14 '23 01:01

zdim