Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Perl parse String with one or more fields

Tags:

regex

perl

I have a string I need to parse. It meets the following requirements:

  • It is comprised of 0 or more key->value pairs.
  • The key is always 2 letters.
  • The value is one or more numbers.
  • There will not be a space between the key and value.
  • There may or may not be a space between individual pairs.

Example strings I may see:

  • AB1234 //One key->value pair (Key=AB, Value=1234)
  • AB1234 BC2345 //Two key->value pairs, separated by space
  • AB1234BC2345 //Two key->value pairs, not separated by space
  • //Empty Sting, No key->value pairs
  • AB12345601BC1234CD1232PE2343 //Lots of key->value pairs, no space
  • AB12345601 BC1234 CD1232 PE2343 //Lots of key->value pairs, with spaces

I need to build a Perl hash of this string. If I could guarantee it was 1 pair I would do something like this:

$string =~ /([A-Z][A-Z])([0-9]+)/
$key = $1
$value = $2
$hash{$key} = $value

For multiple strings, I could potentially do something where after each match of the above regex, I take a substring of the original string (exempting the first match) and then search again. However, I'm sure there's a more clever, perl-esque way to achieve this.

Wishing I didn't have such a crappy data source to deal with-

Jonathan

like image 408
Jonathan Avatar asked Dec 21 '22 05:12

Jonathan


1 Answers

In a list context with the global flag, a regex will return all matched substrings:

use Data::Dumper;

@strs = (
    'AB1234',
    'AB1234 BC2345',
    'AB1234BC2345',
    '',
    'AB12345601BC1234CD1232PE2343',
    'AB12345601 BC1234 CD1232 PE2343'
);

for $str (@strs) {
    # The money line
    %parts = ($str =~ /([A-Z][A-Z])(\d+)/g);

    print Dumper(\%parts);
}

For greater opacity, remove the parentheses around the pattern matching: %parts = $str =~ /([A-Z][A-Z])(\d+)/g;.

like image 124
outis Avatar answered Dec 29 '22 19:12

outis