Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

timedelta to string type in pandas dataframe

I have a dataframe df and its first column is timedelta64

df.info():

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 686 entries, 0 to 685
Data columns (total 6 columns):
0    686 non-null timedelta64[ns]
1    686 non-null object
2    686 non-null object
3    686 non-null object
4    686 non-null object
5    686 non-null object

If I print(df[0][2]), for example, it will give me 0 days 05:01:11. However, I don't want the 0 days filed. I only want 05:01:11 to be printed. Could someone teaches me how to do this? Thanks so much!

like image 383
Chenrui Su Avatar asked Jun 29 '18 12:06

Chenrui Su


3 Answers

It is possible by:

df['duration1'] = df['duration'].astype(str).str[-18:-10]

But solution is not general, if input is 3 days 05:01:11 it remove 3 days too.

So solution working only for timedeltas less as one day correctly.

More general solution is create custom format:

N = 10
np.random.seed(11230)
rng = pd.date_range('2017-04-03 15:30:00', periods=N, freq='13.5H')
df = pd.DataFrame({'duration': np.abs(np.random.choice(rng, size=N) - 
                                 np.random.choice(rng, size=N)) })  

df['duration1'] = df['duration'].astype(str).str[-18:-10]

def f(x):
    ts = x.total_seconds()
    hours, remainder = divmod(ts, 3600)
    minutes, seconds = divmod(remainder, 60)
    return ('{}:{:02d}:{:02d}').format(int(hours), int(minutes), int(seconds)) 

df['duration2'] = df['duration'].apply(f)
print (df)

         duration duration1  duration2
0 2 days 06:00:00  06:00:00   54:00:00
1 2 days 19:30:00  19:30:00   67:30:00
2 1 days 03:00:00  03:00:00   27:00:00
3 0 days 00:00:00  00:00:00    0:00:00
4 4 days 12:00:00  12:00:00  108:00:00
5 1 days 03:00:00  03:00:00   27:00:00
6 0 days 13:30:00  13:30:00   13:30:00
7 1 days 16:30:00  16:30:00   40:30:00
8 0 days 00:00:00  00:00:00    0:00:00
9 1 days 16:30:00  16:30:00   40:30:00
like image 179
jezrael Avatar answered Sep 27 '22 20:09

jezrael


Here's a short and robust version using apply():

df['timediff_string'] = df['timediff'].apply(
    lambda x: f'{x.components.hours:02d}:{x.components.minutes:02d}:{x.components.seconds:02d}'
              if not pd.isnull(x) else ''
)

This leverages the components attribute of pandas Timedelta objects and also handles empty values (NaT).

If the timediff column does not contain pandas Timedelta objects, you can convert it:

df['timediff'] = pd.to_timedelta(df['timediff'])
like image 26
Simon G. Avatar answered Sep 27 '22 18:09

Simon G.


datetime.timedelta already formats the way you'd like. The crux of this issue is that Pandas internally converts to numpy.timedelta.

import pandas as pd
from datetime import timedelta

time_1 = timedelta(days=3, seconds=3400)
time_2 = timedelta(days=0, seconds=3400)
print(time_1)
print(time_2)

times = pd.Series([time_1, time_2])

# Times are converted to Numpy timedeltas.
print(times)

# Convert to string after converting to datetime.timedelta.
times = times.apply(
    lambda numpy_td: str(timedelta(seconds=numpy_td.total_seconds())))

print(times)

So, convert to a datetime.timedelta and then str (to prevent conversion back to numpy.timedelta) before printing.

3 days, 0:56:40
0:56:400

0   3 days 00:56:40
1   0 days 00:56:40
dtype: timedelta64[ns]

0    3 days, 0:56:40
1            0:56:40
dtype: object

I came here looking for answers to the same question, so I felt I should add further clarification. : )

like image 23
jayreed1 Avatar answered Sep 27 '22 20:09

jayreed1