Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Marking the entire group if condition is true for a single row

I have a dataframe which has Dates and public holidays

Date    WeekNum Public_Holiday
1/1/2015    1   1
2/1/2015    1   0
3/1/2015    1   0
4/1/2015    1   0
5/1/2015    1   0
6/1/2015    1   0
7/1/2015    1   0
8/1/2015    2   0
9/1/2015    2   0
10/1/2015   2   0
11/1/2015   2   0
12/1/2015   2   0
13/1/2015   2   0

I have to create a conditional column named Public_Holiday_Week, which should return 1, if that particular week has a public holiday

And I want to see an output like this

Date    WeekNum Public_Holiday  Public_Holiday_Week
1/1/2015    1   1               1
2/1/2015    1   0               1
3/1/2015    1   0               1
4/1/2015    1   0               1
5/1/2015    1   0               1
6/1/2015    1   0               1
7/1/2015    1   0               1
8/1/2015    2   0               0
9/1/2015    2   0               0
10/1/2015   2   0               0
11/1/2015   2   0               0
12/1/2015   2   0               0
13/1/2015   2   0               0

I tried using np.where

df['Public_Holiday_Week'] = np.where(df['Public_Holiday']==1,1,0)

But it applies 0 for other days of the week when it is not a public holiday.

Do I have to apply rolling here? Appreciate your help

like image 620
Ahamed Moosa Avatar asked Jun 25 '18 05:06

Ahamed Moosa


3 Answers

For improve performance dont use groupby, rather get all WeekNum with at least one 1 and then select values by isin, last cast boolean mask to ints:

weeks = df.loc[df['Public_Holiday'].eq(1), 'WeekNum']
df['Public_Holiday_Week'] = df['WeekNum'].isin(weeks).astype(int)

print (df)
         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0    1/1/2015        1               1                    1
1    2/1/2015        1               0                    1
2    3/1/2015        1               0                    1
3    4/1/2015        1               0                    1
4    5/1/2015        1               0                    1
5    6/1/2015        1               0                    1
6    7/1/2015        1               0                    1
7    8/1/2015        2               0                    0
8    9/1/2015        2               0                    0
9   10/1/2015        2               0                    0
10  11/1/2015        2               0                    0
11  12/1/2015        2               0                    0
12  13/1/2015        2               0                    0

As pointed @Mohamed Thasin ah if necessary is possible groupby by week, but then get different output, because different week numbers:

df['weeks'] = pd.to_datetime(df['Date'], dayfirst=True).dt.week

weeks = df.loc[df['Public_Holiday'].eq(1), 'weeks']
df['Public_Holiday_Week'] = df['weeks'].isin(weeks).astype(int)
print (df)
         Date  WeekNum  Public_Holiday  weeks  Public_Holiday_Week
0    1/1/2015        1               1      1                    1
1    2/1/2015        1               0      1                    1
2    3/1/2015        1               0      1                    1
3    4/1/2015        1               0      1                    1
4    5/1/2015        1               0      2                    0
5    6/1/2015        1               0      2                    0
6    7/1/2015        1               0      2                    0
7    8/1/2015        2               0      2                    0
8    9/1/2015        2               0      2                    0
9   10/1/2015        2               0      2                    0
10  11/1/2015        2               0      2                    0
11  12/1/2015        2               0      3                    0
12  13/1/2015        2               0      3                    0
like image 71
jezrael Avatar answered Oct 26 '22 20:10

jezrael


Use resample and skip the use of the WeekNum column altogether.

df.assign(
    Public_Holiday_Week=
    df.resample('W-Wed', on='Date').Public_Holiday.transform('max')
)

         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0  2015-01-01        1               1                    1
1  2015-01-02        1               0                    1
2  2015-01-03        1               0                    1
3  2015-01-04        1               0                    1
4  2015-01-05        1               0                    1
5  2015-01-06        1               0                    1
6  2015-01-07        1               0                    1
7  2015-01-08        2               0                    0
8  2015-01-09        2               0                    0
9  2015-01-10        2               0                    0
10 2015-01-11        2               0                    0
11 2015-01-12        2               0                    0
12 2015-01-13        2               0                    0
like image 44
piRSquared Avatar answered Oct 26 '22 19:10

piRSquared


groupby and max, with map:

df['Public_Holiday_Week'] = df.WeekNum.map(df.groupby('WeekNum').Public_Holiday.max())
print(df)
         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0    1/1/2015        1               1                    1
1    2/1/2015        1               0                    1
2    3/1/2015        1               0                    1
3    4/1/2015        1               0                    1
4    5/1/2015        1               0                    1
5    6/1/2015        1               0                    1
6    7/1/2015        1               0                    1
7    8/1/2015        2               0                    0
8    9/1/2015        2               0                    0
9   10/1/2015        2               0                    0
10  11/1/2015        2               0                    0
11  12/1/2015        2               0                    0
12  13/1/2015        2               0                    0

groupby and transform, with max

df['Public_Holiday_Week'] = df.groupby('WeekNum').Public_Holiday.transform('max')

Thankfully, this will generalise nicely when grouping by month-year:

df['Public_Holiday_Week'] = (
     df.groupby(['WeekNum', df.Date.str.split('/', 1).str[1]])
      .Public_Holiday.transform('max')
)
print(df)
         Date  WeekNum  Public_Holiday  Public_Holiday_Week
0    1/1/2015        1               1                    1
1    2/1/2015        1               0                    1
2    3/1/2015        1               0                    1
3    4/1/2015        1               0                    1
4    5/1/2015        1               0                    1
5    6/1/2015        1               0                    1
6    7/1/2015        1               0                    1
7    8/1/2015        2               0                    0
8    9/1/2015        2               0                    0
9   10/1/2015        2               0                    0
10  11/1/2015        2               0                    0
11  12/1/2015        2               0                    0
12  13/1/2015        2               0                    0
like image 21
cs95 Avatar answered Oct 26 '22 19:10

cs95