I am exporting data in datetime format to csv. When I import it back, I need to be able to read the data in as dates without any column name or column number references.
It looks like Pandas read_csv has options to automatically parse dates into datetime format but it doesn't seem to be working here.
# Create date data
df_list = [['2014-01-01','2014-02-01'],['2015-01-01','2015-02-01']]
df = pd.DataFrame(df_list,columns=['date1','date2'])
# Convert to datetime format
df['date1'] = pd.to_datetime(df['date1'])
# Export to csv
df.to_csv('_csv_file.csv',index=False)
# Read in the data and parse dates
in_df = pd.read_csv('_csv_file.csv',parse_dates=True,infer_datetime_format=True)
# Dates are not of correct type
print df.dtypes
print
print in_df.dtypes
Out [1]:
date1 datetime64[ns]
date2 object
dtype: object
date1 object
date2 object
dtype: object
Is there any way to automatically parse the date columns on import without identifying the column names or locations explicitly?
We can use the parse_dates parameter to convince pandas to turn things into real datetime types. parse_dates takes a list of columns (since you could want to parse multiple columns into datetimes ).
Python has a built-in method to parse dates, strptime . This example takes the string “2020–01–01 14:00” and parses it to a datetime object. The documentation for strptime provides a great overview of all format-string options.
Converts text to a date.
infer_datetime_format : boolean, default False. If True and parse_dates is enabled, pandas will attempt to infer the format of the datetime strings in the columns, and if it can be inferred, switch to a faster method of parsing them. In some cases this can increase the parsing speed by 5-10x.
I think you can change True
to ['date1']
in parameter parse_dates
of read_csv
, because True
means parsing index
and ['date1']
parsing column date1
:
# Read in the data and parse dates
in_df = pd.read_csv('_csv_file.csv', parse_dates=['date1'], infer_datetime_format=True )
#second solution
#instead column name - number of column
#in_df = pd.read_csv('_csv_file.csv',parse_dates=[0], infer_datetime_format=True )
# Dates are not of correct type
print df.dtypes
print
print in_df.dtypes
date1 datetime64[ns]
date2 object
dtype: object
date1 datetime64[ns]
date2 object
dtype: object
Docs:
parse_dates : boolean, list of ints or names, list of lists, or dict, default False
If True -> try parsing the index. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column. If [[1, 3]] -> combine columns 1 and 3 and parse as a single date column. {‘foo’ : [1, 3]} -> parse columns 1, 3 as date and call result ‘foo’ A fast-path exists for iso8601-formatted dates.
infer_datetime_format : boolean, default False
If True and parse_dates is enabled for a column, attempt to infer the datetime format to speed up the processing
It works, if you set column date1
as index:
# Read in the data and parse dates
in_df = pd.read_csv('_csv_file.csv', parse_dates=True, infer_datetime_format=True,
index_col='date1' )
# Dates are not of correct type
print
print in_df.dtypes
print in_df.index
date2 object
dtype: object
DatetimeIndex(['2014-01-01', '2015-01-01'], dtype='datetime64[ns]', name=u'date1', freq=None)
EDIT:
If you want parse all columns to datetime
, you can specify all columns by numbers of columns to parameter parse_dates
:
in_df = pd.read_csv('_csv_file.csv', parse_dates=[0, 1, 2, 3])
But there can be errors - some integers
can be parsed as datetimes
e.g.:
print df
print df.dtypes
date1 date2 int1 int2
0 2014-01-01 2014-02-01 2000 20111230
1 2015-01-01 2015-02-01 2014 20151230
date1 datetime64[ns]
date2 object
int1 int64
int2 int64
dtype: object
print
print in_df
print in_df.dtypes
date1 date2 int1 int2
0 2014-01-01 2014-02-01 2000-01-01 2011-12-30
1 2015-01-01 2015-02-01 2014-01-01 2015-12-30
date1 datetime64[ns]
date2 datetime64[ns]
int1 datetime64[ns]
int2 datetime64[ns]
dtype: object
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With