This is a follow-up and complication to this question: Extracting contents of a string within parentheses.
In that question I had the following string --
"Will Farrell (Nick Hasley), Rebecca Hall (Samantha)"
And I wanted to get a list of tuples in the form of (actor, character)
--
[('Will Farrell', 'Nick Hasley'), ('Rebecca Hall', 'Samantha')]
To generalize matters, I have a slightly more complicated string, and I need to extract the same information. The string I have is --
"Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Glenn Howerton (Gary),
with Stephen Root and Laura Dern (Delilah)"
I need to format this as follows:
[('Will Farrell', 'Nick Hasley'), ('Rebecca Hall', 'Samantha'), ('Glenn Howerton', 'Gary'),
('Stephen Root',''), ('Lauren Dern', 'Delilah')]
I know I can replace the filler words (with, and, &, etc.), but can't quite figure out how to add a blank entry -- ''
-- if there is no character name for the actor (in this case Stephen Root). What would be the best way to go about doing this?
Finally, I need to take into account if an actor has multiple roles, and build a tuple for each role the actor has. The final string I have is:
"Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Glenn Howerton (Gary, Brad), with
Stephen Root and Laura Dern (Delilah, Stacy)"
And I need to build a list of tuples as follows:
[('Will Farrell', 'Nick Hasley'), ('Rebecca Hall', 'Samantha'), ('Glenn Howerton', 'Gary'),
('Glenn Howerton', 'Brad'), ('Stephen Root',''), ('Lauren Dern', 'Delilah'), ('Lauren Dern', 'Stacy')]
Thank you.
import re
credits = """Will Ferrell (Nick Halsey), Rebecca Hall (Samantha), Glenn Howerton (Gary, Brad), with
Stephen Root and Laura Dern (Delilah, Stacy)"""
# split on commas (only if outside of parentheses), "with" or "and"
splitre = re.compile(r"\s*(?:,(?![^()]*\))|\bwith\b|\band\b)\s*")
# match the part before the parentheses (1) and what's inside the parens (2)
# (only if parentheses are present)
matchre = re.compile(r"([^(]*)(?:\(([^)]*)\))?")
# split the parts inside the parentheses on commas
splitparts = re.compile(r"\s*,\s*")
characters = splitre.split(credits)
pairs = []
for character in characters:
if character:
match = matchre.match(character)
if match:
actor = match.group(1).strip()
if match.group(2):
parts = splitparts.split(match.group(2))
for part in parts:
pairs.append((actor, part))
else:
pairs.append((actor, ""))
print(pairs)
Output:
[('Will Ferrell', 'Nick Halsey'), ('Rebecca Hall', 'Samantha'),
('Glenn Howerton', 'Gary'), ('Glenn Howerton', 'Brad'), ('Stephen Root', ''),
('Laura Dern', 'Delilah'), ('Laura Dern', 'Stacy')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With