I ran in to this bug while trying to parse the few dates through parse_dates of <code>pandas.read_csv()</code>. In the following code snippet, I'm trying to parse dates that have format <code>dd/mm/yy</code> which is resulting me an improper conversion. For some cases, the date field is considered as month and vice versa. To keep it simple, for some cases <code>dd/mm/yy</code> get converted to <code>yyyy-dd-mm</code> instead of <code>yyyy-mm-dd</code>. Case 1: <pre class="prettyprint"><code> 04/10/96 is parsed as 1996-04-10, which is wrong. </code></pre> Case 2: <pre class="prettyprint"><code> 15/07/97 is parsed as 1997-07-15, which is correct. </code></pre> Case 3: <pre class="prettyprint"><code> 10/12/97 is parsed as 1997-10-12, which is wrong. </code></pre> Code Sample <pre class="prettyprint"><code>import pandas as pd df = pd.read_csv('date_time.csv') print 'Data in csv:' print df print df['start_date'].dtypes print '----------------------------------------------' df = pd.read_csv('date_time.csv', parse_dates = ['start_date']) print 'Data after parsing:' print df print df['start_date'].dtypes </code></pre> Current Output <pre class="prettyprint"><code>---------------------- Data in csv: ---------------------- start_date 0 04/10/96 1 15/07/97 2 10/12/97 3 06/03/99 4 //1994 5 /02/1967 object ---------------------- Data after parsing: ---------------------- start_date 0 1996-04-10 1 1997-07-15 2 1997-10-12 3 1999-06-03 4 1994-01-01 5 1967-02-01 datetime64[ns] </code></pre> Expected Output <pre class="prettyprint"><code>---------------------- Data in csv: ---------------------- start_date 0 04/10/96 1 15/07/97 2 10/12/97 3 06/03/99 4 //1994 5 /02/1967 object ---------------------- Data after parsing: ---------------------- start_date 0 1996-10-04 1 1997-07-15 2 1997-12-10 3 1999-03-06 4 1994-01-01 5 1967-02-01 datetime64[ns] </code></pre> More Comments: I could use <code>date_parser</code> or <code>pandas.to_datetime()</code> to specify the proper format for date. But in my case, I have few date fields like <code>['//1997', '/02/1967']</code> for which I need to convert <code>['01/01/1997','01/02/1967']</code>. The <code>parse_dates</code> helps me in converting those type of date fields to the expected format without making me to write extra line of code. Is there any solution for this? Bug Link @GitHub: https://github.com/pydata/pandas/issues/13063

In version pandas <code>0.18.0</code> you can add parameter <code>dayfirst=True</code> and then it works: <pre class="prettyprint"><code>import pandas as pd import io temp=u"""start_date 04/10/96 15/07/97 10/12/97 06/03/99 //1994 /02/1967 """ #after testing replace io.StringIO(temp) to filename df = pd.read_csv(io.StringIO(temp), parse_dates = ['start_date'], dayfirst=True) start_date 0 1996-10-04 1 1997-07-15 2 1997-12-10 3 1999-03-06 4 1994-01-01 5 1967-02-01 </code></pre> Another solution: You can parsing with <code>to_datetime</code> with different parameters <code>format</code> and <code>errors='coerce'</code> and then <code>combine_first</code>: <pre class="prettyprint"><code>date1 = pd.to_datetime(df['start_date'], format='%d/%m/%y', errors='coerce') print date1 0 1996-10-04 1 1997-07-15 2 1997-12-10 3 1999-03-06 4 NaT 5 NaT Name: start_date, dtype: datetime64[ns] date2 = pd.to_datetime(df['start_date'], format='/%m/%Y', errors='coerce') print date2 0 NaT 1 NaT 2 NaT 3 NaT 4 NaT 5 1967-02-01 Name: start_date, dtype: datetime64[ns] date3 = pd.to_datetime(df['start_date'], format='//%Y', errors='coerce') print date3 0 NaT 1 NaT 2 NaT 3 NaT 4 1994-01-01 5 NaT Name: start_date, dtype: datetime64[ns] </code></pre> <pre class="prettyprint"><code>print date1.combine_first(date2).combine_first(date3) 0 1996-10-04 1 1997-07-15 2 1997-12-10 3 1999-03-06 4 1994-01-01 5 1967-02-01 Name: start_date, dtype: datetime64[ns] </code></pre>

pd.read_csv not correctly parsing date/month field when set parse_date = ['column name']

Tags:

python

datetime

pandas

parsing

date-format

I ran in to this bug while trying to parse the few dates through parse_dates of pandas.read_csv(). In the following code snippet, I'm trying to parse dates that have format dd/mm/yy which is resulting me an improper conversion. For some cases, the date field is considered as month and vice versa.

To keep it simple, for some cases dd/mm/yy get converted to yyyy-dd-mm instead of yyyy-mm-dd.

Case 1:

  04/10/96 is parsed as 1996-04-10, which is wrong.

Case 2:

  15/07/97 is parsed as 1997-07-15, which is correct.

Case 3:

  10/12/97 is parsed as 1997-10-12, which is wrong.

Code Sample

import pandas as pd

df = pd.read_csv('date_time.csv') 
print 'Data in csv:'
print df
print df['start_date'].dtypes

print '----------------------------------------------'

df = pd.read_csv('date_time.csv', parse_dates = ['start_date'])
print 'Data after parsing:'
print df
print df['start_date'].dtypes

Current Output

----------------------
Data in csv:
----------------------
  start_date
0   04/10/96
1   15/07/97
2   10/12/97
3   06/03/99
4     //1994
5   /02/1967
object
----------------------
Data after parsing:
----------------------
   start_date
0 1996-04-10
1 1997-07-15
2 1997-10-12
3 1999-06-03
4 1994-01-01
5 1967-02-01
datetime64[ns]

Expected Output

----------------------
Data in csv:
----------------------
   start_date
0   04/10/96
1   15/07/97
2   10/12/97
3   06/03/99
4     //1994
5   /02/1967
object
----------------------
Data after parsing:
----------------------
  start_date

0 1996-10-04
1 1997-07-15
2 1997-12-10
3 1999-03-06
4 1994-01-01
5 1967-02-01
datetime64[ns]

More Comments:

I could use date_parser or pandas.to_datetime() to specify the proper format for date. But in my case, I have few date fields like ['//1997', '/02/1967'] for which I need to convert ['01/01/1997','01/02/1967']. The parse_dates helps me in converting those type of date fields to the expected format without making me to write extra line of code.

Is there any solution for this?

Bug Link @GitHub: https://github.com/pydata/pandas/issues/13063

937

asked May 03 '16 07:05

Saranya Krishnamurthy

1 Answers

In version pandas 0.18.0 you can add parameter dayfirst=True and then it works:

import pandas as pd
import io

temp=u"""start_date
04/10/96
15/07/97
10/12/97
06/03/99
//1994
/02/1967
"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),  parse_dates = ['start_date'], dayfirst=True)
  start_date
0 1996-10-04
1 1997-07-15
2 1997-12-10
3 1999-03-06
4 1994-01-01
5 1967-02-01

Another solution:

You can parsing with to_datetime with different parameters format and errors='coerce' and then combine_first:

date1 = pd.to_datetime(df['start_date'], format='%d/%m/%y', errors='coerce')
print date1
0   1996-10-04
1   1997-07-15
2   1997-12-10
3   1999-03-06
4          NaT
5          NaT
Name: start_date, dtype: datetime64[ns]

date2 = pd.to_datetime(df['start_date'], format='/%m/%Y', errors='coerce')
print date2
0          NaT
1          NaT
2          NaT
3          NaT
4          NaT
5   1967-02-01
Name: start_date, dtype: datetime64[ns]

date3 = pd.to_datetime(df['start_date'], format='//%Y', errors='coerce')
print date3
0          NaT
1          NaT
2          NaT
3          NaT
4   1994-01-01
5          NaT
Name: start_date, dtype: datetime64[ns]

print date1.combine_first(date2).combine_first(date3)
0   1996-10-04
1   1997-07-15
2   1997-12-10
3   1999-03-06
4   1994-01-01
5   1967-02-01
Name: start_date, dtype: datetime64[ns]

106

answered Sep 28 '22 10:09

jezrael

Related questions
                            
                                Visualizing uploaded images in Django Admin
                            
                                Shuffle ordering of some rows in numpy array
                            
                                Query Firebase using python-firebase
                            
                                Donut chart python
                            
                                Starting Background Daemon in Flask App
                            
                                Import Python code and library in Swift [duplicate]
                            
                                Python multithreading send a function to run in main thread from subthread and wait for it until finished
                            
                                How do I run a python script using an already running blender?
                            
                                Python 3 - ImportError: No module named
                            
                                How to add xml nodes in python using ElementTree
                            
                                Parsing large amount of dates with pandas - scalability - performance drops faster than linear
                            
                                wrap boost::optional using boost::python
                            
                                How do I adjust the size and aspect ratio of matplotlib radio buttons?
                            
                                Fail to validate URL in Facebook webhook subscription with python flask on the back end and ssl
                            
                                Why isn't Pickle calling __new__ like the documentation says?
                            
                                python re.sub non-greed substitute fails with a newline in the string [duplicate]
                            
                                Pandas: Melting columns containing tuples
                            
                                python imaplib search with multiple criteria
                            
                                Matplotlib Polar Plot with Lines
                            
                                TensorFlow: how to batch mut-mul a batch tensor by a weight variable?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With