I'm using perl and need to split strings of author names delimited by commas as well as a last "and". The names are formed as first name and last name, looking like this:
$string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";
$string2 = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";
$string3 = "Jane Doe and Joe Smith";
# Next line doesn't work because there is no comma between last two names
@data = split(/,/, $string1);
I would just like to split the full names into elements of an array, like what split() would do, so that the @data array would contain, for example:
@data[0]: "Joe Smith"
@data[1]: "Jason Jones"
@data[2]: "Jane Doe"
@data[3]: "Jack Jones"
However, the problem is that there is no comma between the last two names in the lists. Any help would be appreciated.
You could use a simple alternation in your regular expression for split:
my @parts = split(/\s*,\s*|\s+and\s+/, $string1);
For example:
$ perl -we 'my $string1 = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*|\s+and\s+/, $string1)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones
$ perl -we 'my $string2 = "Jane Doe and Joe Smith";print join("\n",split(/\s*,\s*|\s+and\s+/, $string2)),"\n"'
Jane Doe
Joe Smith
If you also have to deal with the Oxford Comma (i.e. "this, that, and the other thing"), then you could use
my @parts = split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $string1);
For example:
$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe, and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones
$ perl -we 'my $s = "Joe Smith, Jason Jones, Jane Doe and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jason Jones
Jane Doe
Jack Jones
$ perl -we 'my $s = "Joe Smith and Jack Jones";print join("\n",split(/\s*,\s*and\s+|\s*,\s*|\s+and\s+/, $s)),"\n"'
Joe Smith
Jack Jones
Thanks to stackoverflowuser2010 for noting this case.
You'll want the \s*,\s*and\s+
at the beginning to keep the other branches of the alternation from splitting on the comma or "and" first, this order appears to be guaranteed as well:
Alternatives are tried from left to right, so the first alternative found for which the entire expression matches, is the one that is chosen.
Before split
, replace and
with a ,
:
$string1 =~ s{\s+and\s+}{,}g;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With