I have an NFL dataset with a 'description' column with details about the play. Each successful pass and run play has a string that's structured like:
"(12:25) (No Huddle Shotgun) P.Manning pass short left to W.Welker pushed ob at DEN 34 for 10 yards (C.Graham)."
How do I locate/extract the number after "for" in the string, and place it in a new column?
You can use the Series str.extract string method:
In [11]: df = pd.DataFrame([["(12:25) (No Huddle Shotgun) P.Manning pass short left to W.Welker pushed ob at DEN 34 for 10 yards (C.Graham)."]])
In [12]: df
Out[12]:
0
0 (12:25) (No Huddle Shotgun) P.Manning pass sho...
This will "extract" what's it the group (inside the parenthesis):
In [13]: df[0].str.extract("for (\d+)")
Out[13]:
0 10
Name: 0, dtype: object
In [14]: df[0].str.extract("for (\d+) yards")
Out[14]:
0 10
Name: 0, dtype: object
You'll need to convert to int, e.g. using astype(int)
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With