Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting "year" and "week of year" columns to "date" in Pandas

Desired way

In order to convert two columns with year and week of year into date I would expect to do something like:

df['formatted_date'] = df.year*100+df.weekofyear
df['date'] = pd.to_datetime(df['formatted_date'], format='%Y%w')

However, it does not work, giving ValueError:

ValueError: unconverted data remains: 01

Workaround

The workaround, I have found is converting week of year into a day of year and working with year-dayofyear %Y%j format:

df['formatted_date'] = df.year*1000+df.weekofyear*7-6 
df['date'] = pd.to_datetime(df['formatted_date'], format='%Y%j')

The first line becomes ugly, but this works fine. Week of year is in the range (00,53). Any ideas, why is the elegant way not working?

like image 754
Sokolokki Avatar asked Mar 18 '19 13:03

Sokolokki


1 Answers

You need combine %w for day of week - explanation with %W for week:

http://strftime.org/ for %W:

Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0.

And for %w:

Weekday as a decimal number, where 0 is Sunday and 6 is Saturday.


df = pd.DataFrame({'year':[2015, 2018],
                   'weekofyear':[10,12]})

dates = df.year*100+df.weekofyear
@adde
df['date'] = pd.to_datetime(dates.astype(str) + '0', format='%Y%W%w')
print (df)

   year  weekofyear  formatted_date       date
0  2015  10          201510         2015-03-15
1  2018  12          201812         2018-03-25

Another solution:

#added 0 only for demontration, you can remove it
df['formatted_date'] = df.year * 1000 + df.weekofyear * 10 + 0
df['date'] = pd.to_datetime(df['formatted_date'], format='%Y%W%w')
print (df)

   year  weekofyear  formatted_date       date
0  2015  10          2015100        2015-03-15
1  2018  12          2018120        2018-03-25
like image 102
jezrael Avatar answered Oct 12 '22 10:10

jezrael