I am reading some data and creating a dataframe with from_records in which the data contains a text timestamp HH:MM:SS:000000. I can convert to timeseries with pd.to_datetime(data.timestamp, format='%H:%M:%S:%f'). I know the date of the file from the filename. What is a pythonic and performant way to insert the date (and eventually set it as the index)?
Data looks like:
12:00:00:000000 100
12:00:01:123456 200
12:00:02:000000 300
Without the date inserted I get a dataframe that looks like:
1900-01-01 12:00:00.000000 100
1900-01-01 12:00:01.123456 200
1900-01-01 12:00:02.000000 300
And what I'd want is (given date = datetime.date(2017, 6, 28):
2017-06-28 12:00:00.000000 100
2017-06-28 12:00:01.123456 200
2017-06-28 12:00:02.000000 300
pd.to_datetime origin arg sounded like what I want, but it requires the input as a numeric timestamp rather than a string.
You can create string by strftime from date and add it to column time:
df['datetime'] = pd.to_datetime(date.strftime('%Y-%m-%d ') + df['time'],
format='%Y-%m-%d %H:%M:%S:%f')
print (df)
time A datetime
0 12:00:00:000000 100 2017-06-28 12:00:00.000000
1 12:00:01:123456 200 2017-06-28 12:00:01.123456
2 12:00:02:000000 300 2017-06-28 12:00:02.000000
And for index:
df.index = pd.to_datetime(date.strftime('%Y-%m-%d ') + df['time'],
format='%Y-%m-%d %H:%M:%S:%f')
print (df)
time A
time
2017-06-28 12:00:00.000000 12:00:00:000000 100
2017-06-28 12:00:01.123456 12:00:01:123456 200
2017-06-28 12:00:02.000000 12:00:02:000000 300
Another solution:
date = datetime.date(2017, 6, 28)
days = date - datetime.date(1900, 1, 1)
df['datetime'] = pd.to_datetime(df['time'],format='%H:%M:%S:%f') +
pd.to_timedelta(days, unit='d')
print (df)
time A datetime
0 12:00:00:000000 100 2017-06-28 12:00:00.000000
1 12:00:01:123456 200 2017-06-28 12:00:01.123456
2 12:00:02:000000 300 2017-06-28 12:00:02.000000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With