Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing datetime from csv in pandas does not yield DateTimeIndex

Tags:

python

pandas

csv

I'm exploring Pandas - trying to learn and apply it. Currently I have a csv file populated with a financial timeseries data of following structure:

date, time, open, high, low, close, volume 2003.04.08,12:00,1.06830,1.06960,1.06670,1.06690,446 2003.04.08,13:00,1.06700,1.06810,1.06570,1.06630,433 2003.04.08,14:00,1.06650,1.06810,1.06510,1.06670,473 2003.04.08,15:00,1.06670,1.06890,1.06630,1.06850,556 2003.04.08,16:00,1.06840,1.07050,1.06610,1.06680,615

Now I want to convert the csv data into a pandas DataFrame object, so that date and time fields merge and become the DateTimeIndex of the DataFrame like this:

df = pa.read_csv(path,
                 names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
                 parse_dates = {'dateTime': ['date', 'time']},  
                 index_col = 'dateTime')

This works yielding a nice DataFrame object:

<class 'pandas.core.frame.DataFrame'>
Index: 8676 entries, 2003.04.08 12:00 to nan nan
Data columns (total 5 columns):
open     8675  non-null values
high     8675  non-null values
low      8675  non-null values
close    8675  non-null values
vol      8675  non-null values
dtypes: float64(5)

But upon inspection it turns out that the Index is not a DataTimeIndex but unicode strings instead:

type(df.index)
>>> pandas.core.index.Index
df.index
>>> Index([u'2003.04.08 12:00', u'2003.04.08 13:00', u'2003.04.08 14:00', ....

So read_csv parsed the date and time fields, merged them but did not create a DateTimeIndex. As far as I understood from the documentation a new datastructure object supplied with a list of datetime objects should automatically create a DateTimeIndex. Am I wrong? Is the DataFrame object an exception?

I also tried to convert the current index like this:

df.index = pa.to_datetime(df.index)

but no changes have been made to the index and it is still in unicode format. I begin to suspect the default parsing functions aren't doing their job, but I don't get any error messages from them.

How to get a working DateTimeIndex in a DateFrame in this situation?

Solution:

df = pa.read_csv(path,
                 names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
                 parse_dates={'datetime':['date','time']},
                 keep_date_col = True, 
                 index_col='datetime'
             )

now apply the lambda function, doing what the parser should have done:

df['datetime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)
like image 248
EmEs Avatar asked Oct 25 '13 13:10

EmEs


People also ask

What is DatetimeIndex pandas?

DatetimeIndex [source] Immutable ndarray of datetime64 data, represented internally as int64, and which can be boxed to Timestamp objects that are subclasses of datetime and carry metadata such as frequency information.

How do I parse a date in pandas?

For non-standard datetime parsing, use pd.to_datetime after pd.read_csv . To parse an index or column with a mixture of timezones, specify date_parser to be a partially-applied pandas.to_datetime() with utc=True . See Parsing a CSV with mixed timezones for more.

Does pandas support datetime?

pandas supports converting integer or float epoch times to Timestamp and DatetimeIndex . The default unit is nanoseconds, since that is how Timestamp objects are stored internally.

Does PD read_csv return a DataFrame?

Read a CSV File In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs.


1 Answers

Dateutil is unable to parse your data correctly but you can do it after loading like so using strptime:

import datetime
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)

This will yield you the 'DateTime' column as datetime64[ns] and you can use it as your index

EDIT

Hmm.. interestingly when I do this it works:

df = pd.read_csv(r'c:\data\temp.txt', parse_dates={'datetime':['date','time']}, index_col='datetime')

Could you see what happens when you drop the column names from the parameters to read_csv

like image 106
EdChum Avatar answered Oct 12 '22 23:10

EdChum