Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Combined aggregate based on valid values

I have a df with this structure:

id  a1_l1   a2_l1   a3_l1   a1_l2   a2_l2   a3_l2
1   1       5       3       1       2       3
2   1       5       3       1       2       3
3   2       5       3       5       5       3
4   5       5       3       5       5       3
5   5       5       2           
6   5       5       2           
7   5       5       2           
8   2       5       2           
9   3       5       1           
10  3       5       1   

I want to summarize in a table such that I get:

    l1  l2
a1  0.4 0.5
a2  1   0.5
a3  0   0

In which what I'm doing is counting how may times 5 was present divided by the number of valid responses, so that for example:

a1, l1 is equal to .4 as I have 4 values of 5 divided by 10. and a2, l1 equals .5 as I have 2 values of 5 divided by 4 valid responses per column.

Thanks!

like image 840
EGM8686 Avatar asked Nov 07 '21 16:11

EGM8686


People also ask

Why is combined aggregate grading used?

Why is combined aggregate grading used? A. Combined fine and coarse aggregates grading is specified to ensure that a satisfactory amount of aggregate with a minimum amount of void spaces is obtained.

What if the aggregate is not conforming to the desired grading?

Sometime it may happen that, the aggregate available at site is not conforming to the desired grading. In such case two or more aggregates from different sources may be combined to get the required grading. Fineness modulus value can be used as a technical way to blend coarse aggregate with fine aggregate to produce desired grading.

How much aggregate do I need to blend to get grading?

This means that for every 100 kg of coarse aggregate, we need to blend 77 kg of fine aggregate, to get the specified grading of combined aggregate. To know more about fineness modulus of aggregate click here.

How to use aggregate in Excel with error values?

& #N/A); in the AGGREGATE formula, when an appropriate option is used, the AGGREGATE in Excel gives the correct SUM value, neglecting the error value. If you run the =SUM (C8:C16) function directly in cell C17,


Video Answer


2 Answers

You can reshape to have a dataframe with MultiIndex, then perform a simple division of the (sum of the truthy values equal to 5) by not na. Finally, unstack:

df2 = df.set_index('id')
df2.columns = df2.columns.str.split('_', expand = True)
df2 = (df2.eq(5).sum()/df2.notna().sum()).unstack()

output:

     l1   l2
a1  0.4  0.5
a2  1.0  0.5
a3  0.0  0.0
like image 101
mozway Avatar answered Oct 07 '22 11:10

mozway


Try with pd.wide_to_long

s = pd.wide_to_long(df,['a1','a2','a3'],i='id',j = 'level',sep='_',suffix='\\w+')
out = s.eq(5).groupby(level=1).sum()
out = out.T.div(s.groupby(level=1).size())
out
level   l1   l2
a1     0.4  0.2
a2     1.0  0.2
a3     0.0  0.0
like image 1
BENY Avatar answered Oct 07 '22 12:10

BENY