Reading a csv with a timestamp column, with pandas

Tags:

When doing:

import pandas x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime',                                  names=['DateTime', 'X'], header=None, sep=';')

with this data.csv file:

1449054136.83;15.31 1449054137.43;16.19 1449054138.04;19.22 1449054138.65;15.12 1449054139.25;13.12

(the 1st colum is a UNIX timestamp, i.e. seconds elapsed since 1/1/1970), I get this error when resampling the data every 15 second with x.resample('15S'):

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex

It's like the "datetime" information has not been parsed:

                 X DateTime       1.449054e+09  15.31                 1.449054e+09  16.19 ...

How to import a .CSV with date stored as timestamp with pandas module?

Then once I will be able to import the CSV, how to access to the lines for which date > 2015-12-02 12:02:18 ?

252

asked Dec 06 '15 20:12

Basj

2 Answers

Use to_datetime and pass unit='s' to parse the units as unix timestamps, this will be much faster:

In [7]: pd.to_datetime(df.index, unit='s')  Out[7]: DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000',                '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000',                '2015-12-02 11:02:19.250000'],               dtype='datetime64[ns]', name=0, freq=None)

Timings:

In [9]:  import time %%timeit import time def date_parser(string_list):     return [time.ctime(float(x)) for x in string_list]  df = pd.read_csv(io.StringIO(t), parse_dates=[0],  sep=';',                   date_parser=date_parser,                   index_col='DateTime',                   names=['DateTime', 'X'], header=None) 100 loops, best of 3: 4.07 ms per loop

and

In [12]: %%timeit t="""1449054136.83;15.31 1449054137.43;16.19 1449054138.04;19.22 1449054138.65;15.12 1449054139.25;13.12""" df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0]) df.index = pd.to_datetime(df.index, unit='s') 100 loops, best of 3: 1.69 ms per loop

So using to_datetime is over 2x faster on this small dataset, I expect this to scale much better than the other methods

answered Oct 04 '22 21:10

EdChum

My solution was similar to Mike's:

import pandas import datetime def dateparse (time_in_secs):         return datetime.datetime.fromtimestamp(float(time_in_secs))  x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None)  out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18))

answered Oct 04 '22 19:10

Budo Zindovic

Related questions
                            
                                sqlalchemy,creating an sqlite database if it doesn't exist
                            
                                Get timezone used by datetime.datetime.fromtimestamp()
                            
                                ValueError: unconverted data remains: 02:05
                            
                                What is under the hood of x = 'y' 'z' in Python?
                            
                                How do I create a Django form that displays a checkbox label to the right of the checkbox?
                            
                                How to create a class instance without calling initializer?
                            
                                Remove duplicates in list of object with Python
                            
                                How do I separate my models out in django?
                            
                                When do you use 'self' in Python?
                            
                                How can I loop over entries in JSON?
                            
                                Don't put html, head and body tags automatically, beautifulsoup
                            
                                Can I run line_profiler over a pytest test?
                            
                                Unable to parse TAB in JSON files
                            
                                Selecting last n columns and excluding last n columns in dataframe
                            
                                Print command line arguments with argparse?
                            
                                'PyDevTerminalInteractiveShell' object has no attribute 'has_readline'
                            
                                Django doesn't call model clean method
                            
                                Pandas Dataframe Find Rows Where all Columns Equal
                            
                                Python loop for inside lambda
                            
                                Dynamically create an enum with custom values in Python? [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Reading a csv with a timestamp column, with pandas

Tags:

python

pandas

csv

Basj

People also ask

2 Answers

EdChum

Budo Zindovic

Recent Activity

Donate For Us