Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading a csv with a timestamp column, with pandas

Tags:

python

pandas

csv

When doing:

import pandas x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime',                                  names=['DateTime', 'X'], header=None, sep=';') 

with this data.csv file:

1449054136.83;15.31 1449054137.43;16.19 1449054138.04;19.22 1449054138.65;15.12 1449054139.25;13.12 

(the 1st colum is a UNIX timestamp, i.e. seconds elapsed since 1/1/1970), I get this error when resampling the data every 15 second with x.resample('15S'):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex 

It's like the "datetime" information has not been parsed:

                 X DateTime       1.449054e+09  15.31                 1.449054e+09  16.19 ... 

How to import a .CSV with date stored as timestamp with pandas module?

Then once I will be able to import the CSV, how to access to the lines for which date > 2015-12-02 12:02:18 ?

like image 252
Basj Avatar asked Dec 06 '15 20:12

Basj


People also ask

How do I read a CSV file from datetime in Python?

To automatically read dates from a CSV file with Python Pandas, we can set the date_parser argument. to call read_csv with the file to read. And we set parse_dates to 'datetime' to parse dates with datetime .

How do you auto detect the date datetime columns and set their datatype when reading a CSV file in pandas?

You should add parse_dates=True , or parse_dates=['column name'] when reading, thats usually enough to magically parse it.

What is Parse_date in pandas?

parse_dates : boolean or list of ints or names or list of lists or dict, default False. boolean.


2 Answers

Use to_datetime and pass unit='s' to parse the units as unix timestamps, this will be much faster:

In [7]: pd.to_datetime(df.index, unit='s')  Out[7]: DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000',                '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000',                '2015-12-02 11:02:19.250000'],               dtype='datetime64[ns]', name=0, freq=None) 

Timings:

In [9]:  import time %%timeit import time def date_parser(string_list):     return [time.ctime(float(x)) for x in string_list] ​ df = pd.read_csv(io.StringIO(t), parse_dates=[0],  sep=';',                   date_parser=date_parser,                   index_col='DateTime',                   names=['DateTime', 'X'], header=None) 100 loops, best of 3: 4.07 ms per loop 

and

In [12]: %%timeit t="""1449054136.83;15.31 1449054137.43;16.19 1449054138.04;19.22 1449054138.65;15.12 1449054139.25;13.12""" df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0]) df.index = pd.to_datetime(df.index, unit='s') 100 loops, best of 3: 1.69 ms per loop 

So using to_datetime is over 2x faster on this small dataset, I expect this to scale much better than the other methods

like image 68
EdChum Avatar answered Oct 04 '22 21:10

EdChum


My solution was similar to Mike's:

import pandas import datetime def dateparse (time_in_secs):         return datetime.datetime.fromtimestamp(float(time_in_secs))  x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None)  out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18)) 
like image 41
Budo Zindovic Avatar answered Oct 04 '22 19:10

Budo Zindovic