I have a dataframe which has Dates and public holidays
Date WeekNum Public_Holiday
1/1/2015 1 1
2/1/2015 1 0
3/1/2015 1 0
4/1/2015 1 0
5/1/2015 1 0
6/1/2015 1 0
7/1/2015 1 0
8/1/2015 2 0
9/1/2015 2 0
10/1/2015 2 0
11/1/2015 2 0
12/1/2015 2 0
13/1/2015 2 0
I have to create a conditional column named Public_Holiday_Week, which should return 1, if that particular week has a public holiday
And I want to see an output like this
Date WeekNum Public_Holiday Public_Holiday_Week
1/1/2015 1 1 1
2/1/2015 1 0 1
3/1/2015 1 0 1
4/1/2015 1 0 1
5/1/2015 1 0 1
6/1/2015 1 0 1
7/1/2015 1 0 1
8/1/2015 2 0 0
9/1/2015 2 0 0
10/1/2015 2 0 0
11/1/2015 2 0 0
12/1/2015 2 0 0
13/1/2015 2 0 0
I tried using np.where
df['Public_Holiday_Week'] = np.where(df['Public_Holiday']==1,1,0)
But it applies 0 for other days of the week when it is not a public holiday.
Do I have to apply rolling here? Appreciate your help
For improve performance dont use groupby
, rather get all WeekNum
with at least one 1
and then select values by isin
, last cast boolean mask to int
s:
weeks = df.loc[df['Public_Holiday'].eq(1), 'WeekNum']
df['Public_Holiday_Week'] = df['WeekNum'].isin(weeks).astype(int)
print (df)
Date WeekNum Public_Holiday Public_Holiday_Week
0 1/1/2015 1 1 1
1 2/1/2015 1 0 1
2 3/1/2015 1 0 1
3 4/1/2015 1 0 1
4 5/1/2015 1 0 1
5 6/1/2015 1 0 1
6 7/1/2015 1 0 1
7 8/1/2015 2 0 0
8 9/1/2015 2 0 0
9 10/1/2015 2 0 0
10 11/1/2015 2 0 0
11 12/1/2015 2 0 0
12 13/1/2015 2 0 0
As pointed @Mohamed Thasin ah if necessary is possible groupby by week, but then get different output, because different week
numbers:
df['weeks'] = pd.to_datetime(df['Date'], dayfirst=True).dt.week
weeks = df.loc[df['Public_Holiday'].eq(1), 'weeks']
df['Public_Holiday_Week'] = df['weeks'].isin(weeks).astype(int)
print (df)
Date WeekNum Public_Holiday weeks Public_Holiday_Week
0 1/1/2015 1 1 1 1
1 2/1/2015 1 0 1 1
2 3/1/2015 1 0 1 1
3 4/1/2015 1 0 1 1
4 5/1/2015 1 0 2 0
5 6/1/2015 1 0 2 0
6 7/1/2015 1 0 2 0
7 8/1/2015 2 0 2 0
8 9/1/2015 2 0 2 0
9 10/1/2015 2 0 2 0
10 11/1/2015 2 0 2 0
11 12/1/2015 2 0 3 0
12 13/1/2015 2 0 3 0
resample
and skip the use of the WeekNum
column altogether.df.assign(
Public_Holiday_Week=
df.resample('W-Wed', on='Date').Public_Holiday.transform('max')
)
Date WeekNum Public_Holiday Public_Holiday_Week
0 2015-01-01 1 1 1
1 2015-01-02 1 0 1
2 2015-01-03 1 0 1
3 2015-01-04 1 0 1
4 2015-01-05 1 0 1
5 2015-01-06 1 0 1
6 2015-01-07 1 0 1
7 2015-01-08 2 0 0
8 2015-01-09 2 0 0
9 2015-01-10 2 0 0
10 2015-01-11 2 0 0
11 2015-01-12 2 0 0
12 2015-01-13 2 0 0
groupby
and max
, with map
:df['Public_Holiday_Week'] = df.WeekNum.map(df.groupby('WeekNum').Public_Holiday.max())
print(df)
Date WeekNum Public_Holiday Public_Holiday_Week
0 1/1/2015 1 1 1
1 2/1/2015 1 0 1
2 3/1/2015 1 0 1
3 4/1/2015 1 0 1
4 5/1/2015 1 0 1
5 6/1/2015 1 0 1
6 7/1/2015 1 0 1
7 8/1/2015 2 0 0
8 9/1/2015 2 0 0
9 10/1/2015 2 0 0
10 11/1/2015 2 0 0
11 12/1/2015 2 0 0
12 13/1/2015 2 0 0
groupby
and transform
, with max
df['Public_Holiday_Week'] = df.groupby('WeekNum').Public_Holiday.transform('max')
Thankfully, this will generalise nicely when grouping by month-year:
df['Public_Holiday_Week'] = (
df.groupby(['WeekNum', df.Date.str.split('/', 1).str[1]])
.Public_Holiday.transform('max')
)
print(df)
Date WeekNum Public_Holiday Public_Holiday_Week
0 1/1/2015 1 1 1
1 2/1/2015 1 0 1
2 3/1/2015 1 0 1
3 4/1/2015 1 0 1
4 5/1/2015 1 0 1
5 6/1/2015 1 0 1
6 7/1/2015 1 0 1
7 8/1/2015 2 0 0
8 9/1/2015 2 0 0
9 10/1/2015 2 0 0
10 11/1/2015 2 0 0
11 12/1/2015 2 0 0
12 13/1/2015 2 0 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With