I have data in a csv file with dates stored as strings in a standard UK format - %d/%m/%Y
- meaning they look like:
12/01/2012
30/01/2012
The examples above represent 12 January 2012 and 30 January 2012.
When I import this data with pandas version 0.11.0 I applied the following transformation:
import pandas as pd
...
cpts.Date = cpts.Date.apply(pd.to_datetime)
but it converted dates inconsistently. To use my existing example, 12/01/2012 would convert as a datetime object representing 1 December 2012 but 30/01/2012 converts as 30 January 2012, which is what I want.
After looking at this question I tried:
cpts.Date = cpts.Date.apply(pd.to_datetime, format='%d/%m/%Y')
but the results are exactly the same. The source code suggests I'm doing things right so I'm at a loss. Does anyone know what I'm doing wrong?
Function usedstrftime() can change the date format in python. Where, format is a string representing the type of required date format.
Pandas was able to infer the datetime format and correctly convert the string to a datetime data type. In the next section, you’ll learn how to specify specific formats. There will be many times when you receive a date column in a format that’s not immediately inferred by Pandas.
While the data looks like dates, it’s actually formatted as strings. Let’s see how we can use the Pandas to_datetime function to convert the string column to a date time. Pandas was able to infer the datetime format and correctly convert the string to a datetime data type.
To convert multiple columns to datetime in Pandas, you can combine the Pandas apply and to_datetime functions. The .apply () method is applied to a section of multiple columns, and the to_datetime () function into it.
dataframe [‘Date’] = pd.to_datetime (dataframe [‘DateTime’]).dt.date to_datetime is the function used to convert datatime string to datatime Example: Python program to convert datetime to date using pandas through date function
You can use the parse_dates
option from read_csv
to do the conversion directly while reading you data.
The trick here is to use dayfirst=True
to indicate your dates start with the day and not with the month. See here for more information: http://pandas.pydata.org/pandas-docs/dev/generated/pandas.io.parsers.read_csv.html
When your dates have to be the index:
>>> import pandas as pd >>> from StringIO import StringIO >>> s = StringIO("""date,value ... 12/01/2012,1 ... 12/01/2012,2 ... 30/01/2012,3""") >>> >>> pd.read_csv(s, index_col=0, parse_dates=True, dayfirst=True) value date 2012-01-12 1 2012-01-12 2 2012-01-30 3
Or when your dates are just in a certain column:
>>> s = StringIO("""date ... 12/01/2012 ... 12/01/2012 ... 30/01/2012""") >>> >>> pd.read_csv(s, parse_dates=[0], dayfirst=True) date 0 2012-01-12 00:00:00 1 2012-01-12 00:00:00 2 2012-01-30 00:00:00
I think you are calling it correctly, and I posted this as an issue on github.
You can just specify the format to to_datetime
directly, for example:
In [1]: s = pd.Series(['12/1/2012', '30/01/2012'])
In [2]: pd.to_datetime(s, format='%d/%m/%Y')
Out[2]:
0 2012-01-12 00:00:00
1 2012-01-30 00:00:00
dtype: datetime64[ns]
Update: As OP correctly points out this doesn't work with NaN, if you are happy with dayfirst=True
(which works with NaN too):
s.apply(pd.to_datetime, dayfirst=True)
Worth noting that have to be careful using dayfirst
(which is easier than specifying the exact format), since dayfirst
isn't strict.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With