Survived SibSp Parch
0 0 1 0
1 1 1 0
2 1 0 0
3 1 1 0
4 0 0 1
Given the above dataframe, is there an elegant way to groupby
with a condition?
I want to split the data into two groups based on the following conditions:
(df['SibSp'] > 0) | (df['Parch'] > 0) = New Group -"Has Family"
(df['SibSp'] == 0) & (df['Parch'] == 0) = New Group - "No Family"
then take the means of both of these groups and end up with an output like this:
SurvivedMean
Has Family Mean
No Family Mean
Can it be done using groupby or would I have to append a new column using the above conditional statement?
An easy way to group that is to use the sum of those two columns. If either of them is positive, the result will be greater than 1. And groupby accepts an arbitrary array as long as the length is the same as the DataFrame's length so you don't need to add a new column.
family = np.where((df['SibSp'] + df['Parch']) >= 1 , 'Has Family', 'No Family')
df.groupby(family)['Survived'].mean()
Out:
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
Use only one condition if never values in columns SibSp
and Parch
are less as 0
:
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
df = df.groupby(np.where(m1, 'Has Family', 'No Family'))['Survived'].mean()
print (df)
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
If is impossible use first use both conditions:
m1 = (df['SibSp'] > 0) | (df['Parch'] > 0)
m2 = (df['SibSp'] == 0) & (df['Parch'] == 0)
a = np.where(m1, 'Has Family',
np.where(m2, 'No Family', 'Not'))
df = df.groupby(a)['Survived'].mean()
print (df)
Has Family 0.5
No Family 1.0
Name: Survived, dtype: float64
You could define your conditions in a list and use the function group_by_condition
below to create a filtered list for each condition. Afterwards you can select the resulting items using pattern matching:
df = [
{"Survived": 0, "SibSp": 1, "Parch": 0},
{"Survived": 1, "SibSp": 1, "Parch": 0},
{"Survived": 1, "SibSp": 0, "Parch": 0}]
conditions = [
lambda x: (x['SibSp'] > 0) or (x['Parch'] > 0), # has family
lambda x: (x['SibSp'] == 0) and (x['Parch'] == 0) # no family
]
def group_by_condition(l, conditions):
return [[item for item in l if condition(item)] for condition in conditions]
[has_family, no_family] = group_by_condition(df, conditions)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With