Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Specifying date format when converting with pandas.to_datetime

I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y - meaning they look like:

12/01/2012
30/01/2012

The examples above represent 12 January 2012 and 30 January 2012.

When I import this data with pandas version 0.11.0 I applied the following transformation:

import pandas as pd
...
cpts.Date = cpts.Date.apply(pd.to_datetime)

but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want.

After looking at this question I tried:

cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y')

but the results are exactly the same. The source code suggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?

like image 411
cms_mgr Avatar asked May 21 '13 14:05

cms_mgr


People also ask

Which function to change the date format in pandas DataFrame?

Function usedstrftime() can change the date format in python. Where, format is a string representing the type of required date format.

Can pandas infer datetime format?

Pandas was able to infer the datetime format and correctly convert the string to a datetime data type. In the next section, you’ll learn how to specify specific formats. There will be many times when you receive a date column in a format that’s not immediately inferred by Pandas.

How do I convert a string to a date in pandas?

While the data looks like dates, it’s actually formatted as strings. Let’s see how we can use the Pandas to_datetime function to convert the string column to a date time. Pandas was able to infer the datetime format and correctly convert the string to a datetime data type.

How to convert multiple columns to datetime in pandas?

To convert multiple columns to datetime in Pandas, you can combine the Pandas apply and to_datetime functions. The .apply () method is applied to a section of multiple columns, and the to_datetime () function into it.

How to convert datetime string to datatime in Python?

dataframe [‘Date’] = pd.to_datetime (dataframe [‘DateTime’]).dt.date to_datetime is the function used to convert datatime string to datatime Example: Python program to convert datetime to date using pandas through date function


2 Answers

You can use the parse_dates option from read_csv to do the conversion directly while reading you data.
The trick here is to use dayfirst=True to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

When your dates have to be the index:

>>> import pandas as pd >>> from StringIO import StringIO >>> s = StringIO("""date,value ... 12/01/2012,1 ... 12/01/2012,2 ... 30/01/2012,3""") >>>  >>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True)             value date              2012-01-12      1 2012-01-12      2 2012-01-30      3 

Or when your dates are just in a certain column:

>>> s = StringIO("""date ... 12/01/2012 ... 12/01/2012 ... 30/01/2012""") >>>  >>> pd.read_csv(s, parse_dates=[0], dayfirst=True)                  date 0 2012-01-12 00:00:00 1 2012-01-12 00:00:00 2 2012-01-30 00:00:00 
like image 67
joris Avatar answered Sep 21 '22 21:09

joris


I think you are calling it correctly, and I posted this as an issue on github.

You can just specify the format to to_datetime directly, for example:

In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])

In [2]: pd.to_datetime(s, format='%d/%m/%Y')
Out[2]:
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]

Update: As OP correctly points out this doesn't work with NaN, if you are happy with dayfirst=True (which works with NaN too):

s.apply(pd.to_datetime, dayfirst=True)

Worth noting that have to be careful using dayfirst (which is easier than specifying the exact format), since dayfirst isn't strict.

like image 20
Andy Hayden Avatar answered Sep 21 '22 21:09

Andy Hayden