Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Need help splitting this string of names (first name and last name pairs delimited by commas and "and")

Tags:

regex

perl

I'm using perl and need to split strings of author names delimited by commas as well as a last "and". The names are formed as first name and last name, looking like this:

$string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";
$string2 = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";
$string3 = "Jane Doe and Joe Smith";
# Next line doesn't work because there is no comma between last two names
@data = split(/,/, $string1);

I would just like to split the full names into elements of an array, like what split() would do, so that the @data array would contain, for example:

@data[0]: "Joe Smith"
@data[1]: "Jason Jones"
@data[2]: "Jane Doe"
@data[3]: "Jack Jones"

However, the problem is that there is no comma between the last two names in the lists. Any help would be appreciated.

like image 677
stackoverflowuser2010 Avatar asked Aug 28 '11 02:08

stackoverflowuser2010


Video Answer


2 Answers

You could use a simple alternation in your regular expression for split:

my @parts = split(/\s*,\s*|\s+and\s+/, $string1);

For example:

$ perl -we 'my $string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*|\s+and\s+/, $string1)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $string2 = "Jane Doe and Joe Smith";print join("\n",split(/\s*,\s*|\s+and\s+/, $string2)),"\n"'
Jane Doe
Joe Smith

If you also have to deal with the Oxford Comma (i.e. "this, that, and the other thing"), then you could use

my @parts = split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $string1);

For example:

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones

$ perl -we 'my $s = "Joe Smith and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jack Jones

Thanks to stackoverflowuser2010 for noting this case.

You'll want the \s*,\s*and\s+ at the beginning to keep the other branches of the alternation from splitting on the comma or "and" first, this order appears to be guaranteed as well:

Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen.

like image 65
mu is too short Avatar answered Sep 26 '22 13:09

mu is too short


Before split, replace and with a ,:

$string1 =~ s{\s+and\s+}{,}g;
like image 31
Alan Haggai Alavi Avatar answered Sep 26 '22 13:09

Alan Haggai Alavi