Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I parse a string into a hash using keywords in Perl?

I have a string where different predefined keywords introduce different data. Is there a way to do that using clever use of regexp, or something? Here is an example:

Keywords can be "first name: " and "last name: ". Now I want to parse:

"character first name: Han last name: Solo"

into

{ "first name: " => "Han ", "last name: " => "Solo" }

Of course, the order of the keywords in the input string is not fixed. This should also work on :

"character last name: Solo first name: Han"

I understand there are issues to be raised with spaces and so on. I'll ignore them here.

I know how to solve this problem looping on the different keywords, but I don't find that very pretty.

Split almost fits the bill. Its only problem is that it returns an array and not a hash, so I don't know which is the first name or the last name.

My example is somewhat misleading. Here is another one:

my @keywords = ("marker 1", "marker 2", "marker 3");
my $rawString = "beginning marker 1 one un marker 2 two deux marker 3 three trois and the rest";
my %result;
# <grind result>
print Dumper(\%result);

will print:

$VAR1 = {
      'marker 2' => ' two deux ',
      'marker 3' => ' three trois and the rest',
      'marker 1' => ' one un '
    };
like image 591
Jean-Denis Muys Avatar asked Jan 23 '26 04:01

Jean-Denis Muys


2 Answers

Here is a solution using split (with separator retention mode) that is extensible with other keys:

use warnings;
use strict;

my $str = "character first name: Han last name: Solo";

my @keys = ('first name:', 'last name:');

my $regex = join '|' => @keys;

my ($prefix, %hash) = split /($regex)\s*/ => $str;

print "$_ $hash{$_}\n" for keys %hash;

which prints:

last name: Solo
first name: Han 

To handle keys that contain regex metacharacters, replace the my $regex = ... line with:

 my $regex = join '|' => map {quotemeta} @keys;
like image 110
Eric Strom Avatar answered Jan 25 '26 09:01

Eric Strom


The following loops over the string once to find matches (after normalizing the string). The only way you can avoid the loop is if each keyword can only appear once in the text. If that were the case, you could write

my %matches = $string =~ /($re):\s+(\S+)/g;

and be done with it.

The script below deals with possible multiple occurrences.

#!/usr/bin/perl

use strict; use warnings;

use File::Slurp;
use Regex::PreSuf;

my $re = presuf( 'first name', 'last name' );

my $string = read_file \*DATA;
$string =~ s/\n+/ /g;

my %matches;

while ( $string =~ /($re):\s+(\S+)/g ) {
    push @{ $matches{ $1 } }, $2;
}

use Data::Dumper;
print Dumper \%matches;

__DATA__
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
eiusmod tempor incididunt ut labore character first name: Han last
name: Solo et dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud character last name: Solo first name: Han exercitation
ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute
irure dolor in reprehenderit in voluptate velit esse cillum
character last name: Solo first name: Han dolore eu fugiat nulla
pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
culpa qui officia deserunt mollit anim id est laborum
like image 31
Sinan Ünür Avatar answered Jan 25 '26 08:01

Sinan Ünür



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!