Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

to_datetime Value Error: at least that [year, month, day] must be specified Pandas

I am reading from two different CSVs each having date values in their columns. After read_csv I want to convert the data to datetime with the to_datetime method. The formats of the dates in each CSV are slightly different, and although the differences are noted and specified in the to_datetime format argument, the one converts fine, while the other returns the following value error.

ValueError: to assemble mappings requires at least that [year, month, day] be sp
ecified: [day,month,year] is missing

first dte.head()

0  10/14/2016  10/17/2016  10/19/2016    8/9/2016  10/17/2016   7/20/2016
1   7/15/2016   7/18/2016   7/20/2016    6/7/2016   7/18/2016   4/19/2016
2   4/15/2016   4/14/2016   4/18/2016   3/15/2016   4/18/2016   1/14/2016
3   1/15/2016   1/19/2016   1/19/2016  10/19/2015   1/19/2016  10/13/2015
4  10/15/2015  10/14/2015  10/19/2015   7/23/2015  10/14/2015   7/15/2015

this dataframe converts fine using the following code:

dte = pd.to_datetime(dte, infer_datetime_format=True)

or

dte = pd.to_datetime(dte[x], format='%m/%d/%Y')

the second dtd.head()

0   2004-01-02 2004-01-02  2004-01-09 2004-01-16  2004-01-23  2004-01-30
1   2004-01-05 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
2   2004-01-06 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
3   2004-01-07 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06
4   2004-01-08 2004-01-09  2004-01-16 2004-01-23  2004-01-30  2004-02-06

this csv doesn't convert using either:

dtd = pd.to_datetime(dtd, infer_datetime_format=True)

or

dtd = pd.to_datetime(dtd, format='%Y-%m-%d')

It returns the value error above. Interestingly, however, using the parse_dates and infer_datetime_format as arguments of the read_csv method work fine. What is going on here?

like image 963
Jed Avatar asked Oct 12 '16 07:10

Jed


3 Answers

You can stack / pd.to_datetime / unstack

pd.to_datetime(dte.stack()).unstack()

enter image description here

explanation
pd.to_datetime works on a string, list, or pd.Series. dte is a pd.DataFrame and is why you are having issues. dte.stack() produces a a pd.Series where all rows are stacked on top of each other. However, in this stacked form, because it is a pd.Series, I can get a vectorized pd.to_datetime to work on it. the subsequent unstack simply reverses the initial stack to get the original form of dte

like image 71
piRSquared Avatar answered Oct 29 '22 00:10

piRSquared


For me works apply function to_datetime:

print (dtd)
            1           2           3           4           5           6
0                                                                        
0  2004-01-02  2004-01-02  2004-01-09  2004-01-16  2004-01-23  2004-01-30
1  2004-01-05  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
2  2004-01-06  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
3  2004-01-07  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06
4  2004-01-08  2004-01-09  2004-01-16  2004-01-23  2004-01-30  2004-02-06


dtd = dtd.apply(pd.to_datetime)

print (dtd)
           1          2          3          4          5          6
0                                                                  
0 2004-01-02 2004-01-02 2004-01-09 2004-01-16 2004-01-23 2004-01-30
1 2004-01-05 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
2 2004-01-06 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
3 2004-01-07 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
4 2004-01-08 2004-01-09 2004-01-16 2004-01-23 2004-01-30 2004-02-06
like image 28
jezrael Avatar answered Oct 29 '22 01:10

jezrael


It works for me:

dtd.apply(lambda x: pd.to_datetime(x,errors = 'coerce', format = '%Y-%m-%d'))

This way you can use function attributes like above (errors and format). See more https://pandas.pydata.org/pandas-docs/stable/generated/pandas.to_datetime.html

like image 4
Guilherme Fernandes Lopes Avatar answered Oct 29 '22 00:10

Guilherme Fernandes Lopes