When doing:
import pandas x = pandas.read_csv('data.csv', parse_dates=True, index_col='DateTime', names=['DateTime', 'X'], header=None, sep=';')
with this data.csv
file:
1449054136.83;15.31 1449054137.43;16.19 1449054138.04;19.22 1449054138.65;15.12 1449054139.25;13.12
(the 1st colum is a UNIX timestamp, i.e. seconds elapsed since 1/1/1970), I get this error when resampling the data every 15 second with x.resample('15S')
:
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex
It's like the "datetime" information has not been parsed:
X DateTime 1.449054e+09 15.31 1.449054e+09 16.19 ...
How to import a .CSV with date stored as timestamp with pandas module?
Then once I will be able to import the CSV, how to access to the lines for which date > 2015-12-02 12:02:18 ?
To automatically read dates from a CSV file with Python Pandas, we can set the date_parser argument. to call read_csv with the file to read. And we set parse_dates to 'datetime' to parse dates with datetime .
You should add parse_dates=True , or parse_dates=['column name'] when reading, thats usually enough to magically parse it.
parse_dates : boolean or list of ints or names or list of lists or dict, default False. boolean.
Use to_datetime
and pass unit='s'
to parse the units as unix timestamps, this will be much faster:
In [7]: pd.to_datetime(df.index, unit='s') Out[7]: DatetimeIndex(['2015-12-02 11:02:16.830000', '2015-12-02 11:02:17.430000', '2015-12-02 11:02:18.040000', '2015-12-02 11:02:18.650000', '2015-12-02 11:02:19.250000'], dtype='datetime64[ns]', name=0, freq=None)
Timings:
In [9]: import time %%timeit import time def date_parser(string_list): return [time.ctime(float(x)) for x in string_list] df = pd.read_csv(io.StringIO(t), parse_dates=[0], sep=';', date_parser=date_parser, index_col='DateTime', names=['DateTime', 'X'], header=None) 100 loops, best of 3: 4.07 ms per loop
and
In [12]: %%timeit t="""1449054136.83;15.31 1449054137.43;16.19 1449054138.04;19.22 1449054138.65;15.12 1449054139.25;13.12""" df = pd.read_csv(io.StringIO(t), header=None, sep=';', index_col=[0]) df.index = pd.to_datetime(df.index, unit='s') 100 loops, best of 3: 1.69 ms per loop
So using to_datetime
is over 2x faster on this small dataset, I expect this to scale much better than the other methods
My solution was similar to Mike's:
import pandas import datetime def dateparse (time_in_secs): return datetime.datetime.fromtimestamp(float(time_in_secs)) x = pandas.read_csv('data.csv',delimiter=';', parse_dates=True,date_parser=dateparse, index_col='DateTime', names=['DateTime', 'X'], header=None) out = x.truncate(before=datetime.datetime(2015,12,2,12,2,18))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With