I have data in a csv file with dates stored as strings in a standard UK format - <code>%d/%m/%Y</code> - meaning they look like: <pre class="prettyprint"><code>12/01/2012 30/01/2012 </code></pre> The examples above represent 12 January 2012 and 30 January 2012. When I import this data with pandas version 0.11.0 I applied the following transformation: <pre class="prettyprint"><code>import pandas as pd ... cpts.Date = cpts.Date.apply(pd.to_datetime) </code></pre> but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want. After looking at this question I tried: <pre class="prettyprint"><code>cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y') </code></pre> but the results are exactly the same. The source code suggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?

You can use the <code>parse_dates</code> option from <code>read_csv</code> to do the conversion directly while reading you data. The trick here is to use <code>dayfirst=True</code> to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html When your dates have to be the index: <pre class="prettyprint"><code>>>> import pandas as pd >>> from StringIO import StringIO >>> s = StringIO("""date,value ... 12/01/2012,1 ... 12/01/2012,2 ... 30/01/2012,3""") >>> >>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True) value date 2012-01-12 1 2012-01-12 2 2012-01-30 3 </code></pre> Or when your dates are just in a certain column: <pre class="prettyprint"><code>>>> s = StringIO("""date ... 12/01/2012 ... 12/01/2012 ... 30/01/2012""") >>> >>> pd.read_csv(s, parse_dates=[0], dayfirst=True) date 0 2012-01-12 00:00:00 1 2012-01-12 00:00:00 2 2012-01-30 00:00:00 </code></pre>

I think you are calling it correctly, and I posted this as an issue on github. You can just specify the format to <code>to_datetime</code> directly, for example: <pre class="prettyprint"><code>In [1]: s = pd.Series(['12/1/2012', '30/01/2012']) In [2]: pd.to_datetime(s, format='%d/%m/%Y') Out[2]: 0 2012-01-12 00:00:00 1 2012-01-30 00:00:00 dtype: datetime64[ns] </code></pre> Update: As OP correctly points out this doesn't work with NaN, if you are happy with <code>dayfirst=True</code> (which works with NaN too): <pre class="prettyprint"><code>s.apply(pd.to_datetime, dayfirst=True) </code></pre> Worth noting that have to be careful using <code>dayfirst</code> (which is easier than specifying the exact format), since <code>dayfirst</code> isn't strict.

Specifying date format when converting with pandas.to_datetime

Tags:

python

datetime

pandas

I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y - meaning they look like:

12/01/2012
30/01/2012

The examples above represent 12 January 2012 and 30 January 2012.

When I import this data with pandas version 0.11.0 I applied the following transformation:

import pandas as pd
...
cpts.Date = cpts.Date.apply(pd.to_datetime)

but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want.

After looking at this question I tried:

cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y')

but the results are exactly the same. The source code suggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?

411

asked May 21 '13 14:05

cms_mgr

2 Answers

You can use the parse_dates option from read_csv to do the conversion directly while reading you data.
The trick here is to use dayfirst=True to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html

When your dates have to be the index:

>>> import pandas as pd >>> from StringIO import StringIO >>> s = StringIO("""date,value ... 12/01/2012,1 ... 12/01/2012,2 ... 30/01/2012,3""") >>>  >>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True)             value date              2012-01-12      1 2012-01-12      2 2012-01-30      3

Or when your dates are just in a certain column:

>>> s = StringIO("""date ... 12/01/2012 ... 12/01/2012 ... 30/01/2012""") >>>  >>> pd.read_csv(s, parse_dates=[0], dayfirst=True)                  date 0 2012-01-12 00:00:00 1 2012-01-12 00:00:00 2 2012-01-30 00:00:00

answered Sep 21 '22 21:09

joris

I think you are calling it correctly, and I posted this as an issue on github.

You can just specify the format to to_datetime directly, for example:

In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])

In [2]: pd.to_datetime(s, format='%d/%m/%Y')
Out[2]:
0   2012-01-12 00:00:00
1   2012-01-30 00:00:00
dtype: datetime64[ns]

Update: As OP correctly points out this doesn't work with NaN, if you are happy with dayfirst=True (which works with NaN too):

s.apply(pd.to_datetime, dayfirst=True)

Worth noting that have to be careful using dayfirst (which is easier than specifying the exact format), since dayfirst isn't strict.

answered Sep 21 '22 21:09

Andy Hayden

Related questions
                            
                                How to pass uploaded image to template.html in Flask
                            
                                Using adaptive step sizes with scipy.integrate.ode
                            
                                Recommended place for a Django project to live on Linux
                            
                                How to convert a pandas DataFrame into a TimeSeries?
                            
                                Plot pandas dates in matplotlib
                            
                                What are equivalent functions of MULTI and EXEC commands in redis-py?
                            
                                Pycharm warning: must implement all abstract methods
                            
                                Specify cython output file
                            
                                Difference between os.path.dirname(os.path.abspath(__file__)) and os.path.dirname(__file__)
                            
                                Axes class - set explicitly size (width/height) of axes in given units
                            
                                pd.read_hdf throws 'cannot set WRITABLE flag to True of this array'
                            
                                Why does pip install matplotlib version 0.91.1 when PyPi shows version 1.0.0?
                            
                                Python - Setting a datetime in a specific timezone (without UTC conversions)
                            
                                How to split an array according to a condition in numpy?
                            
                                utf8 codec can't decode byte 0x96 in python
                            
                                how to get around "Single '}' encountered in format string" when using .format and formatting in printing
                            
                                How to handle the pylint message: ID:W0612 Unused Variable
                            
                                Is there any type for function in Cython?
                            
                                Why does Python's multiprocessing module import __main__ when starting a new process on Windows?
                            
                                Using files as stdin and stdout for subprocess

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With