Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas - convert strings to time without date

Tags:

python

pandas

I've read loads of SO answers but can't find a clear solution.

I have this data in a df called day1 which represents hours:

1    10:53
2    12:17
3    14:46
4    16:36
5    18:39
6    20:31
7    22:28
Name: time, dtype: object>

I want to convert it into a time format. But when I do this:

day1.time = pd.to_datetime(day1.time, format='H%:M%')

The result includes today's date:

1   2015-09-03 10:53:00
2   2015-09-03 12:17:00
3   2015-09-03 14:46:00
4   2015-09-03 16:36:00
5   2015-09-03 18:39:00
6   2015-09-03 20:31:00
7   2015-09-03 22:28:00
Name: time, dtype: datetime64[ns]>

It seems the format argument isn't working - how do I get the time as shown here without the date?


Update

The following formats the time correctly, but somehow the column is still an object type. Why doesn't it convert to datetime64?

day1['time'] = pd.to_datetime(day1['time'], format='%H:%M').dt.time

1    10:53:00
2    12:17:00
3    14:46:00
4    16:36:00
5    18:39:00
6    20:31:00
7    22:28:00
Name: time, dtype: object>
like image 375
RDJ Avatar asked Sep 03 '15 12:09

RDJ


People also ask

How do you convert a string to just time in python?

Use strptime() function of a time module Use this step if you want to convert a string to a time object. Use the time. strptime(string[, format]) function. This function converts time in string format to a time object in time.

How do you convert columns to time in python?

Use pandas to_datetime() function to convert the column to DateTime on DataFrame. Use the format parameter of this method to specify the pattern of the DateTime string you wanted to convert.

What is a Pandas DataFrame?

What is a DataFrame? A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a table with rows and columns.


Video Answer


3 Answers

After performing the conversion you can use the datetime accessor dt to access just the hour or time component:

In [51]:

df['hour'] = pd.to_datetime(df['time'], format='%H:%M').dt.hour
df
Out[51]:
        time  hour
index             
1      10:53    10
2      12:17    12
3      14:46    14
4      16:36    16
5      18:39    18
6      20:31    20
7      22:28    22

Also your format string H%:M% is malformed, it's likely to raise a ValueError: ':' is a bad directive in format 'H%:M%'

Regarding your last comment the dtype is datetime.time not datetime:

In [53]:
df['time'].iloc[0]

Out[53]:
datetime.time(10, 53)
like image 133
EdChum Avatar answered Oct 23 '22 07:10

EdChum


You can use to_timedelta

pd.to_timedelta(df+':00')
Out[353]: 
1   10:53:00
2   12:17:00
3   14:46:00
4   16:36:00
5   18:39:00
6   20:31:00
7   22:28:00
Name: Time, dtype: timedelta64[ns]
like image 16
BENY Avatar answered Oct 23 '22 05:10

BENY


I recently also struggled with this problem. My method is close to EdChum's method and the result is the same as YOBEN_S's answer.

Just like EdChum illustrated, using dt.hour or dt.time will give you a datetime.time object, which is probably only good for display. I can barely do any comparison or calculation on these objects. So if you need any further comparison or calculation operations on the result columns, it's better to avoid such data formats.

My method is just subtract the date from the to_datetime result:

c = pd.Series(['10:23', '12:17', '14:46'])
pd.to_datetime(c, format='%H:%M') - pd.to_datetime(c, format='%H:%M').dt.normalize()

The result is

0   10:23:00
1   12:17:00
2   14:46:00
dtype: timedelta64[ns]

dt.normalize() basically sets all time component to 00:00:00, and it will only display the date while keeping the datetime64 data format, thereby making it possible to do calculations with it.

My answer is by no means better than the other two. I just want to provide a different approach and hope it helps.

like image 8
Bowen Liu Avatar answered Oct 23 '22 07:10

Bowen Liu