Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas read_csv not recognizing ISO8601 as datetime dtype

Currently I am using pandas to read a csv file into a DataFrame, using the first column as the index. The first column is in ISO 8601 format, so according to the documentation for read_csv, it should be recognized as a datetime:

In [1]: import pandas as pd

In [2]: df = pd.read_csv('data.csv', index_col=0)

In [3]: print df.head()
                        U     V     Z    Ubar    Udir
2014-11-01 00:00:00  0.73 -0.81  0.46  1.0904  317.97
2014-11-01 01:00:00  1.26 -1.50  0.32  1.9590  319.97
2014-11-01 02:00:00  1.50 -1.80  0.13  2.3431  320.19
2014-11-01 03:00:00  1.39 -1.65  0.03  2.1575  319.89
2014-11-01 04:00:00  0.94 -1.08 -0.03  1.4318  318.96

However, when querying the index dtype, it returns 'object':

In [4]: print df.index.dtype
object

I then have to manually convert it to datetime dtype:

In [5]: df.index = pd.to_datetime(df.index)

In [6]: print df.index.dtype
datetime64[ns]

Is there any way to automatically have the index set to datetime dtype when calling read_csv()?

like image 551
Peet Whittaker Avatar asked Dec 03 '14 16:12

Peet Whittaker


People also ask

How do I read a .data file in pandas?

We can read data from a text file using read_table() in pandas. This function reads a general delimited file to a DataFrame object. This function is essentially the same as the read_csv() function but with the delimiter = '\t', instead of a comma by default.

Is datetime a Dtype?

There is no datetime dtype to be set for read_csv as csv files can only contain strings, integers and floats. Setting a dtype to datetime will make pandas interpret the datetime as an object, meaning you will end up with a string.

What output type does pandas read_csv () return?

In this case, the Pandas read_csv() function returns a new DataFrame with the data and labels from the file data. csv , which you specified with the first argument. This string can be any valid path, including URLs.

What does parse_dates in read_csv do?

Reading Timestamps From CSV Files We can use the parse_dates parameter to convince pandas to turn things into real datetime types. parse_dates takes a list of columns (since you could want to parse multiple columns into datetimes ).


1 Answers

I just added column name for first column in csv file.

                 Date     U     V     Z    Ubar    Udir
0  2014-11-01 00:00:00  0.73 -0.81  0.46  1.0904  317.97
1  2014-11-01 01:00:00  1.26 -1.50  0.32  1.9590  319.97
2  2014-11-01 02:00:00  1.50 -1.80  0.13  2.3431  320.19
3  2014-11-01 03:00:00  1.39 -1.65  0.03  2.1575  319.89
4  2014-11-01 04:00:00  0.94 -1.08 -0.03  1.4318  318.96

df = pd.read_csv(input_file)
df.index = pd.to_datetime(df['Date'], format='%Y-%m-%d %H:%M:%S')

If you want to drop the date column, you can use

df = df.drop('Date', 1)
like image 67
Kirubaharan J Avatar answered Sep 22 '22 06:09

Kirubaharan J