I have a df with this structure: <pre class="prettyprint"><code>id a1_l1 a2_l1 a3_l1 a1_l2 a2_l2 a3_l2 1 1 5 3 1 2 3 2 1 5 3 1 2 3 3 2 5 3 5 5 3 4 5 5 3 5 5 3 5 5 5 2 6 5 5 2 7 5 5 2 8 2 5 2 9 3 5 1 10 3 5 1 </code></pre> I want to summarize in a table such that I get: <pre class="prettyprint"><code> l1 l2 a1 0.4 0.5 a2 1 0.5 a3 0 0 </code></pre> In which what I'm doing is counting how may times 5 was present divided by the number of valid responses, so that for example: a1, l1 is equal to .4 as I have 4 values of 5 divided by 10. and a2, l1 equals .5 as I have 2 values of 5 divided by 4 valid responses per column. Thanks!

You can reshape to have a dataframe with MultiIndex, then perform a simple division of the (<code>sum</code> of the truthy values equal to 5) by not na. Finally, <code>unstack</code>: <pre class="prettyprint lang-py prettyprint-override"><code>df2 = df.set_index('id') df2.columns = df2.columns.str.split('_', expand = True) df2 = (df2.eq(5).sum()/df2.notna().sum()).unstack() </code></pre> output: <pre class="prettyprint"><code> l1 l2 a1 0.4 0.5 a2 1.0 0.5 a3 0.0 0.0 </code></pre>

Combined aggregate based on valid values

Tags:

python

pandas

dataframe

I have a df with this structure:

id  a1_l1   a2_l1   a3_l1   a1_l2   a2_l2   a3_l2
1   1       5       3       1       2       3
2   1       5       3       1       2       3
3   2       5       3       5       5       3
4   5       5       3       5       5       3
5   5       5       2           
6   5       5       2           
7   5       5       2           
8   2       5       2           
9   3       5       1           
10  3       5       1

I want to summarize in a table such that I get:

    l1  l2
a1  0.4 0.5
a2  1   0.5
a3  0   0

In which what I'm doing is counting how may times 5 was present divided by the number of valid responses, so that for example:

a1, l1 is equal to .4 as I have 4 values of 5 divided by 10. and a2, l1 equals .5 as I have 2 values of 5 divided by 4 valid responses per column.

Thanks!

840

asked Nov 07 '21 16:11

EGM8686

Video Answer

2 Answers

You can reshape to have a dataframe with MultiIndex, then perform a simple division of the (sum of the truthy values equal to 5) by not na. Finally, unstack:

df2 = df.set_index('id')
df2.columns = df2.columns.str.split('_', expand = True)
df2 = (df2.eq(5).sum()/df2.notna().sum()).unstack()

output:

     l1   l2
a1  0.4  0.5
a2  1.0  0.5
a3  0.0  0.0

101

answered Oct 07 '22 11:10

mozway

Try with pd.wide_to_long

s = pd.wide_to_long(df,['a1','a2','a3'],i='id',j = 'level',sep='_',suffix='\\w+')
out = s.eq(5).groupby(level=1).sum()
out = out.T.div(s.groupby(level=1).size())
out
level   l1   l2
a1     0.4  0.2
a2     1.0  0.2
a3     0.0  0.0

answered Oct 07 '22 12:10

BENY

Related questions
                            
                                What does next() and iter() do in PyTorch's DataLoader()
                            
                                Is AWS boto (python) supporting SES signature version 4?
                            
                                Create sub cell in Spyder
                            
                                Pandas Dataframe replace part of string with value from another column
                            
                                X axis in Matplotlib print random numbers instead of the years
                            
                                Best way to specify nested dict with pydantic?
                            
                                Finding the width of the emoji using python3
                            
                                How do add an assembled field to a Pydantic model
                            
                                What is the safest way to queue multiple threads originating in a loop?
                            
                                removing loops with numpy.einsum
                            
                                Pygame Tic Tak Toe Logic? How Would I Do It
                            
                                Plotly: Create a Scatter with categorical x-axis jitter and multi level axis
                            
                                Regex for extracting names starting with Mr.|Mrs|The|DR after honorable
                            
                                Google Chrome cannot read and write to its data directory : selenium
                            
                                Unable to start Redis Queue (RQ) worker in Python
                            
                                Why is Python's requests 10x faster than C's libcurl?
                            
                                How to fix function/symbol 'pango_context_set_round_glyph_positions' error
                            
                                s3fs suddenly stopped working in Google Colab with error "AttributeError: module 'aiobotocore' has no attribute 'AioSession'" [closed]
                            
                                walrus operator in dict comprehension
                            
                                How to fillna in pandas dataframe based on pattern like in excel dragging?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With