Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extracting Year and Month from a custom text field

I have a data frame with a column that has information on the number of years / months the person has an account with the organization.

The field is a custom text format,

eg: '0yrs 11mon', '15yrs 4mon' etc.

Is there a way to extract just to extract the yrs and months in to a separate column like below:

Member Since | Year | Mon
-------------------------

'0yrs 11mon' | 0 | 11 
-------------------------------------------

'15yrs 4mon' | 15 | 4
---------------------
like image 632
siva Avatar asked Jan 27 '23 11:01

siva


1 Answers

You can use Series.str.extract:

df['Year'] = df['Member Since'].str.extract('(\d+)(?:yrs)')
df['Mon'] = df['Member Since'].str.extract('(?:\s)(\d+)(?:mon)')

the ?: is non capturing group notation. Which means it matches the the regular expression but doesn't capture in the output

You can also try Series.str.extractall:

df[['year','mon']]= df['Member Since'].str.extractall('(\d+)').unstack()
df
  Member Since  year    mon
0   0yrs 11mon   0      11
1   15yrs 4mon   15      4

The extractall function actually returns a DataFrame with one row for each match, and one column for each group and so you need to unstack it.

Use the extract all method only if you have numbers as shown in your question. More or less than two will return an error.

like image 96
Mohit Motwani Avatar answered Jan 30 '23 22:01

Mohit Motwani