I'm exploring Pandas - trying to learn and apply it. Currently I have a csv file populated with a financial timeseries data of following structure: <code>date, time, open, high, low, close, volume 2003.04.08,12:00,1.06830,1.06960,1.06670,1.06690,446 2003.04.08,13:00,1.06700,1.06810,1.06570,1.06630,433 2003.04.08,14:00,1.06650,1.06810,1.06510,1.06670,473 2003.04.08,15:00,1.06670,1.06890,1.06630,1.06850,556 2003.04.08,16:00,1.06840,1.07050,1.06610,1.06680,615</code> Now I want to convert the csv data into a pandas DataFrame object, so that date and time fields merge and become the DateTimeIndex of the DataFrame like this: <pre class="prettyprint"><code>df = pa.read_csv(path, names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'], parse_dates = {'dateTime': ['date', 'time']}, index_col = 'dateTime') </code></pre> This works yielding a nice DataFrame object: <pre class="prettyprint"><code><class 'pandas.core.frame.DataFrame'> Index: 8676 entries, 2003.04.08 12:00 to nan nan Data columns (total 5 columns): open 8675 non-null values high 8675 non-null values low 8675 non-null values close 8675 non-null values vol 8675 non-null values dtypes: float64(5) </code></pre> But upon inspection it turns out that the Index is not a DataTimeIndex but unicode strings instead: <pre class="prettyprint"><code>type(df.index) >>> pandas.core.index.Index df.index >>> Index([u'2003.04.08 12:00', u'2003.04.08 13:00', u'2003.04.08 14:00', .... </code></pre> So <code>read_csv</code> parsed the date and time fields, merged them but did not create a DateTimeIndex. As far as I understood from the documentation a new datastructure object supplied with a list of datetime objects should automatically create a DateTimeIndex. Am I wrong? Is the DataFrame object an exception? I also tried to convert the current index like this: <pre class="prettyprint"><code>df.index = pa.to_datetime(df.index) </code></pre> but no changes have been made to the index and it is still in unicode format. I begin to suspect the default parsing functions aren't doing their job, but I don't get any error messages from them. How to get a working DateTimeIndex in a DateFrame in this situation? Solution: <pre class="prettyprint"><code>df = pa.read_csv(path, names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'], parse_dates={'datetime':['date','time']}, keep_date_col = True, index_col='datetime' ) </code></pre> now apply the lambda function, doing what the parser should have done: <pre class="prettyprint"><code>df['datetime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1) </code></pre>

Dateutil is unable to parse your data correctly but you can do it after loading like so using <code>strptime</code>: <pre class="prettyprint"><code>import datetime df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1) </code></pre> This will yield you the 'DateTime' column as <code>datetime64[ns]</code> and you can use it as your index EDIT Hmm.. interestingly when I do this it works: <pre class="prettyprint"><code>df = pd.read_csv(r'c:\data\temp.txt', parse_dates={'datetime':['date','time']}, index_col='datetime') </code></pre> Could you see what happens when you drop the column names from the parameters to <code>read_csv</code>

Parsing datetime from csv in pandas does not yield DateTimeIndex

Tags:

python

pandas

csv

I'm exploring Pandas - trying to learn and apply it. Currently I have a csv file populated with a financial timeseries data of following structure:

date, time, open, high, low, close, volume 2003.04.08,12:00,1.06830,1.06960,1.06670,1.06690,446 2003.04.08,13:00,1.06700,1.06810,1.06570,1.06630,433 2003.04.08,14:00,1.06650,1.06810,1.06510,1.06670,473 2003.04.08,15:00,1.06670,1.06890,1.06630,1.06850,556 2003.04.08,16:00,1.06840,1.07050,1.06610,1.06680,615

Now I want to convert the csv data into a pandas DataFrame object, so that date and time fields merge and become the DateTimeIndex of the DataFrame like this:

df = pa.read_csv(path,
                 names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
                 parse_dates = {'dateTime': ['date', 'time']},  
                 index_col = 'dateTime')

This works yielding a nice DataFrame object:

<class 'pandas.core.frame.DataFrame'>
Index: 8676 entries, 2003.04.08 12:00 to nan nan
Data columns (total 5 columns):
open     8675  non-null values
high     8675  non-null values
low      8675  non-null values
close    8675  non-null values
vol      8675  non-null values
dtypes: float64(5)

But upon inspection it turns out that the Index is not a DataTimeIndex but unicode strings instead:

type(df.index)
>>> pandas.core.index.Index
df.index
>>> Index([u'2003.04.08 12:00', u'2003.04.08 13:00', u'2003.04.08 14:00', ....

So read_csv parsed the date and time fields, merged them but did not create a DateTimeIndex. As far as I understood from the documentation a new datastructure object supplied with a list of datetime objects should automatically create a DateTimeIndex. Am I wrong? Is the DataFrame object an exception?

I also tried to convert the current index like this:

df.index = pa.to_datetime(df.index)

but no changes have been made to the index and it is still in unicode format. I begin to suspect the default parsing functions aren't doing their job, but I don't get any error messages from them.

How to get a working DateTimeIndex in a DateFrame in this situation?

Solution:

df = pa.read_csv(path,
                 names = ['date', 'time', 'open', 'high', 'low', 'close', 'vol'],
                 parse_dates={'datetime':['date','time']},
                 keep_date_col = True, 
                 index_col='datetime'
             )

now apply the lambda function, doing what the parser should have done:

df['datetime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)

248

asked Oct 25 '13 13:10

EmEs

1 Answers

Dateutil is unable to parse your data correctly but you can do it after loading like so using strptime:

import datetime
df['DateTime'] = df.apply(lambda row: datetime.datetime.strptime(row['date']+ ':' + row['time'], '%Y.%m.%d:%H:%M'), axis=1)

This will yield you the 'DateTime' column as datetime64[ns] and you can use it as your index

EDIT

Hmm.. interestingly when I do this it works:

df = pd.read_csv(r'c:\data\temp.txt', parse_dates={'datetime':['date','time']}, index_col='datetime')

Could you see what happens when you drop the column names from the parameters to read_csv

106

answered Oct 12 '22 23:10

EdChum

Related questions
                            
                                how to use first band of 3d numpy array as imaginary values for all other bands
                            
                                Python 2.7 : difference between exit() and raise ValueError("example")
                            
                                Pycharm Remote Python Interpreter over SSH Gateway, X11 forwarding
                            
                                Python main thread interruption
                            
                                Batch editing of csv files with Python
                            
                                How to filter models using timezone aware dates?
                            
                                Using mysqldb and sqlite3 in the same Python 2.7 script: Should I throw in the towel?
                            
                                MySQL, should I stay connected or connect when needed?
                            
                                converting string to unicode type in python
                            
                                Plumbing equivalent to git remote show origin (use from Python)
                            
                                Local Variable referenced before assignment inside of a class
                            
                                Convex hull routines in scipy.spatial gives me back my original set of points
                            
                                Unicode in Flask-Restful API and JSON issue
                            
                                In SWIG compilation : In header file in interface is unable to resolve other header files.
                            
                                How to log a python MemoryError (when I'm out of memory)
                            
                                Selenium WebDriver - Disable Native Events (Enable Synthesized Events)
                            
                                Pandas DataFrame.reset_index for columns
                            
                                How to stream XML output quickly from Python
                            
                                AttributeError when trying to use tkFont
                            
                                Override lxml behavior to write a closing and opening element for Null tags

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With