How do I use Perl to intersperse characters between consecutive matches with a regex substitution?

Question

The following lines of comma-separated values contains several consecutive empty fields:

$rawData = 
"2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear

2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,
"

I want to replace these empty fields with 'N/A' values, which is why I decided to do it via a regex substitution.

I tried this first of all:

$rawdata =~ s/,([,
])/,N/A/g; # RELABEL UNAVAILABLE DATA AS 'N/A'

which returned

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear

2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,,N/A,

Not what I wanted. The problem occurs when more than two consecutive commas occur. The regex gobbles up two commas at a time, so it starts at the third comma rather than the second when it rescans the string.

I thought this could be something to do with lookahead vs. lookback assertions, so I tried the following regex out:

$rawdata =~ s/(?<=,)([,
])|,([,
])$/,N/A$1/g; # RELABEL UNAVAILABLE DATA AS 'N/A'

which resulted in:

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,N/A,Clear

2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,,N/A,,N/A

That didn't work either. It just shifted the comma-pairings by one.

I know that washing this string through the same regex twice will do it, but that seems crude. Surely, there must be a way to get a single regex substitution to do the job. Any suggestions?

The final string should look like this:

2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,N/A,Clear

2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,N/A,,N/A,N/A,N/A,N/A,N/A

Sinan Ünür · Accepted Answer

EDIT: Note that you could open a filehandle to the data string and let readline deal with line endings:

#!/usr/bin/perl

use strict; use warnings;
use autodie;

my $str = <<EO_DATA;
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,,,,
EO_DATA

open my $str_h, '<', \$str;

while(my $row = <$str_h>) {
    chomp $row;
    print join(',',
        map { length $_ ? $_ : 'N/A'} split /,/, $row, -1
    ), "
";
}

Output:

E:\Home> t.pl
2008-02-06,8:00 AM,14.0,6.0,59,1027,-9999.0,West,6.9,-,N/A,N/A,Clear
2008-02-06,9:00 AM,16,6,40,1028,12,WNW,10.4,N/A,N/A,N/A,N/A

You can also use:

pos $str -= 1 while $str =~ s{,(,|
)}{,N/A$1}g;

Explanation: When s/// finds a ,, and replaces it with ,N/A, it has already moved to the character after the last comma. So, it will miss some consecutive commas if you only use

$str =~ s{,(,|
)}{,N/A$1}g;

Therefore, I used a loop to move pos $str back by a character after each successful substitution.

Now, as @ysth shows:

$str =~ s!,(?=[,
])!,N/A!g;

would make the while unnecessary.

How do I use Perl to intersperse characters between consecutive matches with a regex substitution?

Tags:

regex

perl

substitution

Zaid

1 Answers

Sinan Ünür

Recent Activity

Donate For Us

How do I use Perl to intersperse characters between consecutive matches with a regex substitution?

Tags:

regex

perl

substitution

Zaid

1 Answers

Sinan Ünür

Related questions

Recent Activity

Donate For Us