Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: converting Trip duration of h min sec and leave only minute count

I am new to python and programming (so please go easy) and hope someone can help. I have bike trip duration as dtype: object Duration 14h 26min. 2sec. 0h 8min. 34sec. 0h 12min. 17sec.

I would ideally like to create a new column holding the calculated minute duration as an integer. So h needs *60, and seconds rounded.So I would have: Duration 866 9 12

I am having trouble it the first hurdle of splitting and getting just the digit. I have had some joy in splitting with this:

def ConvertDuration(Minutes):
    return Minutes.split(' ')[0].split('.')[1].strip()
WashBike['DurationMin'] = pd.DataFrame({'Duration':WashBike['Duration'].apply(ConvertDuration)})

I can play around with positions and create one column each for h, min and sec. however string character will remain. would I need to do another split to further separate and remove characters then?
I have also tried to strip the characters with the following:

WashBike['DurationInt'] = WashBike['Duration'].str.strip(' ').str.strip('.').str.strip('hHmMiInNsSeEcC')

I have not been able to get to the stage of putting the values into a single figure minute unit measurement. I am thinking of doing something like this:

WashBike['DurationMn'] = WashBike['Duration'].split(' ').apply(lambda x: int(x[0]) * 60 + int(x[1] + int(x[2].round()) ))

Although I am not able to get this far.

I have spent 2 days looking through stackoverflow and others. I have found plenty about datetime etc. While i have tried to convert Duration to hh:mm:ss, I'm not sure if its the correct course. Any help and advice would be much appreciated.

Ken

like image 659
Ken Lawlor Avatar asked Apr 30 '15 19:04

Ken Lawlor


1 Answers

This is straightforward frequency conversion, see here

In [16]: df = pd.DataFrame({'Duration': ['4h 26min. 2sec.',
                                        '0h 8min. 34sec.',
                                        '0h 12min. 17sec.']})

In [17]: df
Out[17]: 
           Duration
0   4h 26min. 2sec.
1   0h 8min. 34sec.
2  0h 12min. 17sec.

These are almost in standard form, just zonk the .

In [18]: pd.to_timedelta(df.Duration.str.replace('\.',''))                        
Out[18]: 
0   04:26:02
1   00:08:34
2   00:12:17
Name: Duration, dtype: timedelta64[ns]

A float result in minutes

In [19]: pd.to_timedelta(df.Duration.str.replace('\.','')) / np.timedelta64(1,'m')
Out[19]: 
0    266.033333
1      8.566667
2     12.283333
Name: Duration, dtype: float64

This truncates

In [20]: pd.to_timedelta(df.Duration.str.replace('\.','')).astype('timedelta64[m]')
Out[20]: 
0    266
1      8
2     12
Name: Duration, dtype: float64
like image 86
Jeff Avatar answered Oct 10 '22 23:10

Jeff