Python

Question

Beginner python (and therefore pandas) user. I am trying to import some data into a pandas dataframe. One of the columns is the date, but in the format "YYYYMM". I have attempted to do what most forum responses suggest:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')

This doesn't work though (ValueError: unconverted data remains: 3). The column actually includes an additional value for each year, with MM=13. The source used this row as an average of the past year. I am guessing to_datetime is having an issue with that.

Could anyone offer a quick solution, either to strip out all of the annual averages (those with the last two digits "13"), or to have to_datetime ignore them?

EdChum · Accepted Answer

pass errors='coerce' and then dropna the NaT rows:

df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()

The duff month values will get converted to NaT values

In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')

Out[36]: NaT

Alternatively you could filter them out before the conversion

df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')

although this could lead to alignment issues as the returned Series needs to be the same length so just passing errors='coerce' is a simpler solution

Python - Pandas - Convert YYYYMM to datetime

Tags:

pandas

Mtd240

1 Answers

EdChum

Recent Activity

Donate For Us