I am trying to convert a dataframe column to a timedelta but am having issues. The format that the column comes in looks like '+XX:XX:XX' or '-XX:XX:XX'
My dataframe:
df = pd.DataFrame({'time':['+06:00:00', '-04:00:00'],})
My approach:
df['time'] = pd.Timedelta(df['time'])
However, I get the error:
ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible
When I do a simpler example:
time = pd.Timedelta('+06:00:00')
I get my desired output:
Timedelta('0 days 06:00:00')
What would be the approach if I wanted to convert a series into a timedelta with my desired output?
I would strongly recommend to use specifically designed and vectorized (i.e. very fast) method: to_timedelta():
In [40]: pd.to_timedelta(df['time'])
Out[40]:
0 06:00:00
1 -1 days +20:00:00
Name: time, dtype: timedelta64[ns]
Timing against a 200K rows DF:
In [41]: df = pd.concat([df] * 10**5, ignore_index=True)
In [42]: df.shape
Out[42]: (200000, 1)
In [43]: %timeit pd.to_timedelta(df['time'])
1 loop, best of 3: 891 ms per loop
In [44]: %timeit df['time'].apply(pd.Timedelta)
1 loop, best of 3: 7.15 s per loop
In [45]: %timeit [pd.Timedelta(x) for x in df['time']]
1 loop, best of 3: 5.52 s per loop
The error is pretty clear:
ValueError: Value must be Timedelta, string, integer, float, timedelta or convertible
What you are passing to pd.Timedelta()
is none of the above data types:
>>> type(df['time'])
<class 'pandas.core.series.Series'>
Probably what you want it:
>>> [pd.Timedelta(x) for x in df['time']]
[Timedelta('0 days 06:00:00'), Timedelta('-1 days +20:00:00')]
Or:
>>> df['time'].apply(pd.Timedelta)
0 06:00:00
1 -1 days +20:00:00
Name: time, dtype: timedelta64[ns]
See more examples in the docs.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With