Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

loose timedelta format when I export to csv - is there a solution?

I have a Pandas Dataframe with datetime objects (incl. timedelta). When I create the DF everything is fine but when I export it to csv and then import it again the datetime objects are strings.

I tried using

pd.read_csv('xyz.csv',parse_dates=True)

when importing as well as

df.to_csv('xyz.csv',date_format='%Y-%m-%d %H:%M:%S')

when exporting. But it does not work.

Context: I created a program that generates data, puts it in a pandas DF and these DFs must be stored until the program will be opened the next time.

So my question is: Is there a possibility to do that with the CSV.format? In general, what is the best format to export pandas DFs to keep as much of their properties as possible? Thank you!

Edit:

Data sample: This is a row in the DF (the indices are datetime objects). The columns are 'Tasks' (which is string format) and 'Duration' (which are the timedelta objects).

2017-04-18 08:11:39|PyMC3_Book|0 days 00:24:49.919194

like image 260
Jaynes01 Avatar asked Apr 19 '17 06:04

Jaynes01


2 Answers

That isn't how read_csvs parse_dates parameter works

From the Docs:

  • parse_dates : boolean or list of ints or names or list of lists or dict, default False
    • boolean. If True -> try parsing the index.
    • list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
    • list of lists. e.g. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column.
    • dict, e.g. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’
  • Note: A fast-path exists for iso8601-formatted dates.

So It's telling us that parse_dates=True only attempts to parse the index. Otherwise, you need to pass a list of column positions that indicate the columns that need to be parsed as dates.


You may want to use a converters dictionary to explicitly handle these columns
Consider the following df

df = pd.DataFrame(dict(
        A=pd.to_datetime(['2017-01-01']),
        B=pd.to_timedelta([37], unit='s')
    ))

Write it to file

df.to_csv('test.csv', index=None)

Define converters dictionary

converters = dict(A=pd.to_datetime, B=pd.to_timedelta)
# in your case
# converters = dict(Duration=pd.to_timedelta)

Read csv

df = pd.read_csv('test.csv', converters=converters)

df

           A        B
0 2017-01-01 00:00:37

df.dtypes

A     datetime64[ns]
B    timedelta64[ns]
dtype: object
like image 168
piRSquared Avatar answered Sep 29 '22 09:09

piRSquared


I think you can use to_pickle and then read_pickle - docs:

df.to_pickle('xyz.pkl')

df = pd.read_pickle('xyz.pkl')

But if need timedelta:

import pandas as pd
import numpy as np
from pandas.compat import StringIO

temp=u"""Tasks|Duration
2017-04-18 08:11:39|PyMC3_Book|0 days 00:24:49.919194"""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="|", index_col=None, parse_dates=False)

print (df)
                          Tasks                Duration
2017-04-18 08:11:39  PyMC3_Book  0 days 00:24:49.919194

df.to_csv('xyz.csv')

df = pd.read_csv('xyz.csv', index_col=0, parse_dates=True)
df['Duration'] = pd.to_timedelta(df['Duration'])
print (df)
                          Tasks        Duration
2017-04-18 08:11:39  PyMC3_Book 00:24:49.919194

print (df.dtypes)
Tasks                object
Duration    timedelta64[ns]
dtype: object

print (df.index)

DatetimeIndex(['2017-04-18 08:11:39'], dtype='datetime64[ns]', freq=None)
like image 33
jezrael Avatar answered Sep 29 '22 09:09

jezrael