How to filter the rows in a data frame based on another column value?
I have a data frame which is,
ip_df:
class name marks min_marks min_subjects
0 I tom [89,85,80,74] 80 2
1 II sam [65,72,43,40] 85 1
Based on the column values of "min_subject" and "min_marks", the row should be filtered.
For index 0, the "min_subjects" is "2", at least 2 elements in "marks" column should be greater than 80 i.e., "min_marks" column then a new column named "flag" has to be added as 1
For index 1, the "min_subjects" is "1", at least 1 element in "marks" column should be greater than 85 i.e., "min_marks" column then a new column named "flag" has to be added as 0 (i.e., flag=0 as the condition didnt satisfy here)
The final outcome should be,
op_df:
class name marks min_marks min_subjects flag
0 I tom [89,85,80,74] 80 2 1
1 II sam [65,72,43,40] 85 1 0
Can anyone help me to achieve the same in the data frame?
You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.
Use list comprehension with zip
by 3 columns, compare each value in generator and sum
for count, last compare by minimal marks and convert to integers:
df['flag'] = [1 if sum(x > c for x in a) >= b else 0
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
Alternative with convert boolean by int
to 0,1
:
df['flag'] = [int(sum(x > c for x in a) >= b)
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
Or solution with numpy
:
df['flag'] = [int(np.sum(np.array(a) > c) >= b)
for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]
print (df)
class name marks min_marks min_subjects flag
0 I tom [89, 85, 80, 74] 80 2 1
1 II sam [65, 72, 43, 40] 85 1 0
To avoid the for
loop and make full use of parallel computations you can use the new function explode
(Pandas 0.25.0):
df1 = df.explode('marks')
print(df1)
Output:
class name marks min_marks min_subjects
0 I tom 89 80 2
0 I tom 85 80 2
0 I tom 80 80 2
0 I tom 74 80 2
1 II sam 65 85 1
1 II sam 72 85 1
1 II sam 43 85 1
1 II sam 40 85 1
Compare the columns marks
and min_marks
:
df['flag'] = df1['marks'].gt(df1['min_marks'])\
.groupby(df1.index).sum().ge(df['min_subjects']).astype(int)
print(df)
Output:
class name marks min_marks min_subjects flag
0 I tom [89, 85, 80, 74] 80 2 1
1 II sam [65, 72, 43, 40] 85 1 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With