I am new to python and programming (so please go easy) and hope someone can help. I have bike trip duration as dtype: object Duration 14h 26min. 2sec. 0h 8min. 34sec. 0h 12min. 17sec.
I would ideally like to create a new column holding the calculated minute duration as an integer. So h needs *60, and seconds rounded.So I would have: Duration 866 9 12
I am having trouble it the first hurdle of splitting and getting just the digit. I have had some joy in splitting with this:
def ConvertDuration(Minutes):
return Minutes.split(' ')[0].split('.')[1].strip()
WashBike['DurationMin'] = pd.DataFrame({'Duration':WashBike['Duration'].apply(ConvertDuration)})
I can play around with positions and create one column each for h, min and sec.
however string character will remain. would I need to do another split to further separate and remove characters then?
I have also tried to strip the characters with the following:
WashBike['DurationInt'] = WashBike['Duration'].str.strip(' ').str.strip('.').str.strip('hHmMiInNsSeEcC')
I have not been able to get to the stage of putting the values into a single figure minute unit measurement. I am thinking of doing something like this:
WashBike['DurationMn'] = WashBike['Duration'].split(' ').apply(lambda x: int(x[0]) * 60 + int(x[1] + int(x[2].round()) ))
Although I am not able to get this far.
I have spent 2 days looking through stackoverflow and others. I have found plenty about datetime etc. While i have tried to convert Duration to hh:mm:ss, I'm not sure if its the correct course. Any help and advice would be much appreciated.
Ken
This is straightforward frequency conversion, see here
In [16]: df = pd.DataFrame({'Duration': ['4h 26min. 2sec.',
'0h 8min. 34sec.',
'0h 12min. 17sec.']})
In [17]: df
Out[17]:
Duration
0 4h 26min. 2sec.
1 0h 8min. 34sec.
2 0h 12min. 17sec.
These are almost in standard form, just zonk the .
In [18]: pd.to_timedelta(df.Duration.str.replace('\.',''))
Out[18]:
0 04:26:02
1 00:08:34
2 00:12:17
Name: Duration, dtype: timedelta64[ns]
A float result in minutes
In [19]: pd.to_timedelta(df.Duration.str.replace('\.','')) / np.timedelta64(1,'m')
Out[19]:
0 266.033333
1 8.566667
2 12.283333
Name: Duration, dtype: float64
This truncates
In [20]: pd.to_timedelta(df.Duration.str.replace('\.','')).astype('timedelta64[m]')
Out[20]:
0 266
1 8
2 12
Name: Duration, dtype: float64
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With