I have a set of .csv files that I'm trying to clean up. Each has data like this:
x0,"","",""
x1,123,456,789
x2,123,456,789
x3,123,456,789
-,"","",""
x4,123,456,789
[space],____,____,____
x5,123,456,789
x6,===,====,======
x7,---,--------=--,-------
I want to delete all lines that are not xn,###,###,###, so in this example, it would be lines 1, 5, 7, 9, and 10. In the cygwin command line, I type the following commands 1 by 1:
sed -i '/"",""/d' *.csv
sed -i '/___/d' *.csv
sed -i '/---/d' *.csv
sed -i '/===/d' *.csv
and these all work. However, when I try to put them TOGETHER into a perl script (the rest of my code is in perl, they fail:
system("sed -i '/"",""/d' *.csv");
system("sed -i '/___/d' *.csv");
system("sed -i '/---/d' *.csv");
system("sed -i '/===/d' *.csv");
and I get the result:
String found where operator expected at test1.pl line 1, near ""sed -i '/"",""
(Missing operator before ","?)
String found where operator expected at test1.pl line 1, near "",""/d' *.csv""
(Missing operator before "/d' *.csv"?)
syntax error at test1.pl line 1, near ""sed -i '/"",""
I notice all work except that first command -- is there something special about "" in sed? Any help would be appreciated! A simpler solution is welcome as well!
If the rest of your script is in Perl, I would strongly suggest replacing your calls to sed with a native implementation.
For example, the replacements you have made using sed could be replaced with something like this:
use strict;
use warnings;
for my $file (glob '*.csv') {
open my $in, '<', $file;
my @lines;
while (<$in>) {
next if /"",""/;
next if /___/;
next if /---/;
next if /===/;
push @lines, $_;
}
close $in;
# this will overwrite your files!
# change $file to something else to test
open my $out, '>', $file;
print $out $_ for @lines;
}
This loops through each file ending in .csv, reading each line. It skips any lines that match one of the patterns (you could do this using a single regex with | between each pattern if you wanted but I left it the same as your calls to sed). It pushes any remaining lines to an array. It then reopens the input file for writing and prints the array.
Granted, it's slightly longer in terms of numbers of lines but it saves you having to use system to call external commands when Perl is more than capable. It also means that each file is only opened once, rather than once per substitution as in your original code.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With