I ran in to this bug while trying to parse the few dates through parse_dates of pandas.read_csv()
. In the following code snippet, I'm trying to parse dates that have format dd/mm/yy
which is resulting me an improper conversion. For some cases, the date field is considered as month and vice versa.
To keep it simple, for some cases dd/mm/yy
get converted to yyyy-dd-mm
instead of yyyy-mm-dd
.
Case 1:
04/10/96 is parsed as 1996-04-10, which is wrong.
Case 2:
15/07/97 is parsed as 1997-07-15, which is correct.
Case 3:
10/12/97 is parsed as 1997-10-12, which is wrong.
Code Sample
import pandas as pd
df = pd.read_csv('date_time.csv')
print 'Data in csv:'
print df
print df['start_date'].dtypes
print '----------------------------------------------'
df = pd.read_csv('date_time.csv', parse_dates = ['start_date'])
print 'Data after parsing:'
print df
print df['start_date'].dtypes
Current Output
----------------------
Data in csv:
----------------------
start_date
0 04/10/96
1 15/07/97
2 10/12/97
3 06/03/99
4 //1994
5 /02/1967
object
----------------------
Data after parsing:
----------------------
start_date
0 1996-04-10
1 1997-07-15
2 1997-10-12
3 1999-06-03
4 1994-01-01
5 1967-02-01
datetime64[ns]
Expected Output
----------------------
Data in csv:
----------------------
start_date
0 04/10/96
1 15/07/97
2 10/12/97
3 06/03/99
4 //1994
5 /02/1967
object
----------------------
Data after parsing:
----------------------
start_date
0 1996-10-04
1 1997-07-15
2 1997-12-10
3 1999-03-06
4 1994-01-01
5 1967-02-01
datetime64[ns]
More Comments:
I could use date_parser
or pandas.to_datetime()
to specify the proper format for date. But in my case, I have few date fields like ['//1997', '/02/1967']
for which I need to convert ['01/01/1997','01/02/1967']
. The parse_dates
helps me in converting those type of date fields to the expected format without making me to write extra line of code.
Is there any solution for this?
Bug Link @GitHub: https://github.com/pydata/pandas/issues/13063
parse_dates : boolean or list of ints or names or list of lists or dict, default False. boolean. If True -> try parsing the index. list of ints or names. e.g. If [1, 2, 3] -> try parsing columns 1, 2, 3 each as a separate date column.
Pandas intelligently handles DateTime values when you import a dataset into a DataFrame. The library will try to infer the data types of your columns when you first import a dataset.
Python has a built-in method to parse dates, strptime . This example takes the string “2020–01–01 14:00” and parses it to a datetime object. The documentation for strptime provides a great overview of all format-string options.
Read a comma-separated values (csv) file into DataFrame. Also supports optionally iterating or breaking of the file into chunks.
In version pandas 0.18.0
you can add parameter dayfirst=True
and then it works:
import pandas as pd
import io
temp=u"""start_date
04/10/96
15/07/97
10/12/97
06/03/99
//1994
/02/1967
"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), parse_dates = ['start_date'], dayfirst=True)
start_date
0 1996-10-04
1 1997-07-15
2 1997-12-10
3 1999-03-06
4 1994-01-01
5 1967-02-01
Another solution:
You can parsing with to_datetime
with different parameters format
and errors='coerce'
and then combine_first
:
date1 = pd.to_datetime(df['start_date'], format='%d/%m/%y', errors='coerce')
print date1
0 1996-10-04
1 1997-07-15
2 1997-12-10
3 1999-03-06
4 NaT
5 NaT
Name: start_date, dtype: datetime64[ns]
date2 = pd.to_datetime(df['start_date'], format='/%m/%Y', errors='coerce')
print date2
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
5 1967-02-01
Name: start_date, dtype: datetime64[ns]
date3 = pd.to_datetime(df['start_date'], format='//%Y', errors='coerce')
print date3
0 NaT
1 NaT
2 NaT
3 NaT
4 1994-01-01
5 NaT
Name: start_date, dtype: datetime64[ns]
print date1.combine_first(date2).combine_first(date3)
0 1996-10-04
1 1997-07-15
2 1997-12-10
3 1999-03-06
4 1994-01-01
5 1967-02-01
Name: start_date, dtype: datetime64[ns]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With