How to apply Pandas Groupby with multiple conditions for split and apply multiple calculations?

Question

I have a dataframe that I need to split based on 2 conditions and this conditions applied to 3 different columns, then using those new splits perform some calculations on other columns in a pairwise manner.

So I have a df A

A = pd.DataFrame([[1, 5, 2, 3, 4, 1], [2, 4, 4, 5, 5, 1], [3, 3, 1, 2, 5, 4], [4, 2, 2, 3, 5, 5], [5, 1, 4, 2, 3, 1]],
                 columns=['A', 'B', 'C', 'D', 'E', 'F'], index=[1, 2, 3, 4, 5])

I need to split the dataframe based on 2 conditions and applied to the first three columns, like this"

condition1_colA = A[A['A'] > 1]
condition2_colA = A[A['A'] < 1]
condition1_colB = A[A['B'] > 1]
condition2_colB = A[A['B'] < 1]
condition1_colC = A[A['C'] > 1]
condition2_colC = A[A['C'] < 1]

Now, I need to do 3 calculations on each of 6 new sliced dataframes, however this calculations are paired up in the following way:

condition1_colA and condition2_colA with column['D']
condition1_colB and condition2_colB with column['E']
condition1_colC and condition2_colC with column['F']

Here would be the example for the first pair:

x = condition1_colA['D'].mean()
y = condition1_colA['D'].std()
z = condition1_colA['D'][condition1_colA['D'] > 0].count()
x1 = condition2_colA['D'].mean()
y1 = condition2_colA['D'].std()
z1 = condition2_colA['D'][condition1_colA['D'] < 0].count()

Then we would do the same for the other 2 pairs. And organize the 18 calculations on a datframe. I know how to do this manually, but I'm hoping for some help doing it with groupby or perhaps pivot table.

Thanks

unutbu · Accepted Answer

import pandas as pd

A = pd.DataFrame(
    [[1, 5, 2, 3, 4, 1], [2, 4, 4, 5, 5, 1], [3, 3, 1, 2, 5, 4],
     [4, 2, 2, 3, 5, 5], [5, 1, 4, 2, 3, 1]],
    columns=['A', 'B', 'C', 'D', 'E', 'F'], index=[1, 2, 3, 4, 5])

result = []
pairs = []
for col1, col2 in zip(list('ABC'), list('DEF')):
    pairs.append(col1+col2)
    mask = A[col1] > 1
    mask2 = mask & (A[col2] > 0)
    ser = A.loc[mask, col2]
    result.append((ser.mean(), ser.std(), mask2.sum()))

    mask = A[col1] < 1
    mask2 = mask & (A[col2] < 0)
    ser = A.loc[mask, col2]
    result.append((ser.mean(), ser.std(), mask2.sum()))

index = pd.MultiIndex.from_product([pairs, [1, 2]], names=['pair', 'cond'])
df = pd.DataFrame(result, index=index, columns=['mean', 'sum', 'count'])
print(df)

yields

           mean       sum  count
pair cond                       
AD   1     3.00  1.414214      4
     2      NaN       NaN      0
BE   1     4.75  0.500000      4
     2      NaN       NaN      0
CF   1     2.00  2.000000      4
     2      NaN       NaN      0

How to apply Pandas Groupby with multiple conditions for split and apply multiple calculations?

Tags:

python

split

pandas

dataframe

subset

hernanavella

1 Answers

unutbu

Recent Activity

Donate For Us

How to apply Pandas Groupby with multiple conditions for split and apply multiple calculations?

Tags:

python

split

pandas

dataframe

subset

hernanavella

1 Answers

unutbu

Related questions

Recent Activity

Donate For Us