I am reading from two different CSVs each having date values in their columns. After read_csv I want to convert the data to datetime with the to_datetime method. The formats of the dates in each CSV are slightly different, and although the differences are noted and specified in the to_datetime format argument, the one converts fine, while the other returns the following value error.
ValueError: to assemble mappings requires at least that [year, month, day] be sp
ecified: [day,month,year] is missing
first dte.head()
0 10/14/2016 10/17/2016 10/19/2016 8/9/2016 10/17/2016 7/20/2016
1 7/15/2016 7/18/2016 7/20/2016 6/7/2016 7/18/2016 4/19/2016
2 4/15/2016 4/14/2016 4/18/2016 3/15/2016 4/18/2016 1/14/2016
3 1/15/2016 1/19/2016 1/19/2016 10/19/2015 1/19/2016 10/13/2015
4 10/15/2015 10/14/2015 10/19/2015 7/23/2015 10/14/2015 7/15/2015
this dataframe converts fine using the following code:
dte = pd.to_datetime(dte, infer_datetime_format=True)
or
dte = pd.to_datetime(dte[x], format='%m/%d/%Y')
the second dtd.head()
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
this csv doesn't convert using either:
dtd = pd.to_datetime(dtd, infer_datetime_format=True)
or
dtd = pd.to_datetime(dtd, format='%Y-%m-%d')
It returns the value error above. Interestingly, however, using the parse_dates and infer_datetime_format as arguments of the read_csv method work fine. What is going on here?
You can stack
/ pd.to_datetime
/ unstack
pd.to_datetime(dte.stack()).unstack()
explanationpd.to_datetime
works on a string, list, or pd.Series
. dte
is a pd.DataFrame
and is why you are having issues. dte.stack()
produces a a pd.Series
where all rows are stacked on top of each other. However, in this stacked form, because it is a pd.Series
, I can get a vectorized pd.to_datetime
to work on it. the subsequent unstack
simply reverses the initial stack
to get the original form of dte
For me works apply
function to_datetime
:
print (dtd)
1 2 3 4 5 6
0
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
dtd = dtd.apply(pd.to_datetime)
print (dtd)
1 2 3 4 5 6
0
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
It works for me:
dtd.apply(lambda x: pd.to_datetime(x,errors = 'coerce', format = '%Y-%m-%d'))
This way you can use function attributes like above (errors and format). See more https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With