Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract a certain part of a string after a key phrase using pandas?

I have an NFL dataset with a 'description' column with details about the play. Each successful pass and run play has a string that's structured like:

"(12:25) (No Huddle Shotgun) P.Manning pass short left to W.Welker pushed ob at DEN 34 for 10 yards (C.Graham)."

How do I locate/extract the number after "for" in the string, and place it in a new column?

like image 285
mlaugh4 Avatar asked Dec 20 '22 17:12

mlaugh4


1 Answers

You can use the Series str.extract string method:

In [11]: df = pd.DataFrame([["(12:25) (No Huddle Shotgun) P.Manning pass short left to W.Welker pushed ob at DEN 34 for 10 yards (C.Graham)."]])

In [12]: df
Out[12]:
                                                   0
0  (12:25) (No Huddle Shotgun) P.Manning pass sho...

This will "extract" what's it the group (inside the parenthesis):

In [13]: df[0].str.extract("for (\d+)")
Out[13]:
0    10
Name: 0, dtype: object

In [14]: df[0].str.extract("for (\d+) yards")
Out[14]:
0    10
Name: 0, dtype: object

You'll need to convert to int, e.g. using astype(int).

like image 88
Andy Hayden Avatar answered Jan 06 '23 07:01

Andy Hayden