Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Difference between parse_date=[0] and parse_date=True in pandas.read_csv

Tags:

pandas

This code:

import pandas as pd
from StringIO import StringIO

data = "date,c1\n2012-07-31 02:00,1.1\n2012-07-31 02:15,2.2\n2012-07-31 02:30,3.3\n"

df1 = pd.read_csv(StringIO(data),parse_dates=True).set_index(('date'))
df2 = pd.read_csv(StringIO(data),parse_dates=[0] ).set_index(('date'))

print "df1:\n{index}".format(index=df1.index)
print "df2:\n{index}".format(index=df2.index)

returns:

df1:
array([2012-07-31 02:00, 2012-07-31 02:15, 2012-07-31 02:30], dtype=object)
df2:
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-07-31 02:00:00, ..., 2012-07-31 02:30:00]
Length: 3, Freq: None, Timezone: None

Is this difference between df1 and df2 a bug,feature, or have I misunderstood something?

like image 396
SlimJim Avatar asked Jan 29 '26 15:01

SlimJim


1 Answers

Looks like a bug to me. I created an issue for this.

Note that by using the *index_col* argument it is possible to set the index.

In [15]: df = pd.read_csv(StringIO(data),parse_dates=[0], index_col=0)

In [15]: df.index
<class 'pandas.tseries.index.DatetimeIndex'>
[2012-07-31 02:00:00, ..., 2012-07-31 02:30:00]
Length: 3, Freq: None, Timezone: None
like image 63
Wouter Overmeire Avatar answered Feb 03 '26 08:02

Wouter Overmeire



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!