This is my data frame
index duration
1 7 year
2 2day
3 4 week
4 8 month
I need to separate numbers from time and put them in two new columns. The output is like this:
index duration number time
1 7 year 7 year
2 2day 2 day
3 4 week 4 week
4 8 month 8 month
This is my code:
df ['numer'] = df.duration.replace(r'\d.*' , r'\d', regex=True, inplace = True)
df [ 'time']= df.duration.replace (r'\.w.+',r'\w.+', regex=True, inplace = True )
But it does not work. Any suggestion ?
I also need to create another column based on the values of time column. So the new dataset is like this:
index duration number time time_days
1 7 year 7 year 365
2 2day 2 day 1
3 4 week 4 week 7
4 8 month 8 month 30
df['time_day']= df.time.replace(r'(year|month|week|day)', r'(365|30|7|1)', regex=True, inplace=True)
Any suggestion ?
we can use Series.str.extract here:
In [67]: df[['number','time']] = df.duration.str.extract(r'(\d+)\s*(.*)', expand=True)
In [68]: df
Out[68]:
index duration number time
0 1 7 year 7 year
1 2 2day 2 day
2 3 4 week 4 week
3 4 8 month 8 month
RegEx explained - regex101.com is IMO one of the best online RegEx parser, tester and explainer
you may also want to convert number
column to integer dtype:
In [69]: df['number'] = df['number'].astype(int)
In [70]: df.dtypes
Out[70]:
index int64
duration object
number int32
time object
dtype: object
UPDATE:
In [167]: df['time_day'] = df['time'].replace(['year','month','week','day'], [365, 30, 7, 1], regex=True)
In [168]: df
Out[168]:
index duration number time time_day
0 1 7 year 7 year 365
1 2 2day 2 day 1
2 3 4 week 4 week 7
3 4 8 month 8 month 30
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With