I have a Pandas Dataframe with datetime objects (incl. timedelta). When I create the DF everything is fine but when I export it to csv and then import it again the datetime objects are strings.
I tried using
pd.read_csv('xyz.csv',parse_dates=True)
when importing as well as
df.to_csv('xyz.csv',date_format='%Y-%m-%d %H:%M:%S')
when exporting. But it does not work.
Context: I created a program that generates data, puts it in a pandas DF and these DFs must be stored until the program will be opened the next time.
So my question is: Is there a possibility to do that with the CSV.format? In general, what is the best format to export pandas DFs to keep as much of their properties as possible? Thank you!
Edit:
Data sample: This is a row in the DF (the indices are datetime objects). The columns are 'Tasks' (which is string format) and 'Duration' (which are the timedelta objects).
2017-04-18 08:11:39|PyMC3_Book|0 days 00:24:49.919194
That isn't how read_csv
s parse_dates
parameter works
From the Docs:
So It's telling us that parse_dates=True
only attempts to parse the index. Otherwise, you need to pass a list of column positions that indicate the columns that need to be parsed as dates.
You may want to use a converters
dictionary to explicitly handle these columns
Consider the following df
df = pd.DataFrame(dict(
A=pd.to_datetime(['2017-01-01']),
B=pd.to_timedelta([37], unit='s')
))
Write it to file
df.to_csv('test.csv', index=None)
Define converters
dictionary
converters = dict(A=pd.to_datetime, B=pd.to_timedelta)
# in your case
# converters = dict(Duration=pd.to_timedelta)
Read csv
df = pd.read_csv('test.csv', converters=converters)
df
A B
0 2017-01-01 00:00:37
df.dtypes
A datetime64[ns]
B timedelta64[ns]
dtype: object
I think you can use to_pickle
and then read_pickle
- docs:
df.to_pickle('xyz.pkl')
df = pd.read_pickle('xyz.pkl')
But if need timedelta
:
import pandas as pd
import numpy as np
from pandas.compat import StringIO
temp=u"""Tasks|Duration
2017-04-18 08:11:39|PyMC3_Book|0 days 00:24:49.919194"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="|", index_col=None, parse_dates=False)
print (df)
Tasks Duration
2017-04-18 08:11:39 PyMC3_Book 0 days 00:24:49.919194
df.to_csv('xyz.csv')
df = pd.read_csv('xyz.csv', index_col=0, parse_dates=True)
df['Duration'] = pd.to_timedelta(df['Duration'])
print (df)
Tasks Duration
2017-04-18 08:11:39 PyMC3_Book 00:24:49.919194
print (df.dtypes)
Tasks object
Duration timedelta64[ns]
dtype: object
print (df.index)
DatetimeIndex(['2017-04-18 08:11:39'], dtype='datetime64[ns]', freq=None)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With