How to filter the rows in a data frame based on another column value? I have a data frame which is, <pre class="prettyprint"><code>ip_df: class name marks min_marks min_subjects 0 I tom [89,85,80,74] 80 2 1 II sam [65,72,43,40] 85 1 </code></pre> Based on the column values of "min_subject" and "min_marks", the row should be filtered. <ul> <li>For index 0, the "min_subjects" is "2", at least 2 elements in "marks" column should be greater than 80 i.e., "min_marks" column then a new column named "flag" has to be added as 1</li> <li>For index 1, the "min_subjects" is "1", at least 1 element in "marks" column should be greater than 85 i.e., "min_marks" column then a new column named "flag" has to be added as 0 (i.e., flag=0 as the condition didnt satisfy here)</li> </ul> The final outcome should be, <pre class="prettyprint"><code>op_df: class name marks min_marks min_subjects flag 0 I tom [89,85,80,74] 80 2 1 1 II sam [65,72,43,40] 85 1 0 </code></pre> Can anyone help me to achieve the same in the data frame?

Use list comprehension with <code>zip</code> by 3 columns, compare each value in generator and <code>sum</code> for count, last compare by minimal marks and convert to integers: <pre class="prettyprint"><code>df['flag'] = [1 if sum(x > c for x in a) >= b else 0 for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])] </code></pre> Alternative with convert boolean by <code>int</code> to <code>0,1</code>: <pre class="prettyprint"><code>df['flag'] = [int(sum(x > c for x in a) >= b) for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])] </code></pre> Or solution with <code>numpy</code>: <pre class="prettyprint"><code>df['flag'] = [int(np.sum(np.array(a) > c) >= b) for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])] </code></pre> <hr> <pre class="prettyprint"><code>print (df) class name marks min_marks min_subjects flag 0 I tom [89, 85, 80, 74] 80 2 1 1 II sam [65, 72, 43, 40] 85 1 0 </code></pre>

Pandas : filter the rows based on a column containing lists

How to filter the rows in a data frame based on another column value?

I have a data frame which is,

ip_df:
     class    name     marks          min_marks  min_subjects
0    I        tom      [89,85,80,74]  80         2
1    II       sam      [65,72,43,40]  85         1

Based on the column values of "min_subject" and "min_marks", the row should be filtered.

For index 0, the "min_subjects" is "2", at least 2 elements in "marks" column should be greater than 80 i.e., "min_marks" column then a new column named "flag" has to be added as 1
For index 1, the "min_subjects" is "1", at least 1 element in "marks" column should be greater than 85 i.e., "min_marks" column then a new column named "flag" has to be added as 0 (i.e., flag=0 as the condition didnt satisfy here)

The final outcome should be,

op_df:
     class    name     marks          min_marks  min_subjects flag
0    I        tom      [89,85,80,74]  80         2            1
1    II       sam      [65,72,43,40]  85         1            0

Can anyone help me to achieve the same in the data frame?

How do I filter out rows in pandas DataFrame?

You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

Use list comprehension with zip by 3 columns, compare each value in generator and sum for count, last compare by minimal marks and convert to integers:

df['flag'] = [1 if sum(x > c for x in a) >= b else 0 
                 for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]

Alternative with convert boolean by int to 0,1:

df['flag'] = [int(sum(x > c for x in a) >= b)
                 for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]

Or solution with numpy:

df['flag'] = [int(np.sum(np.array(a) > c) >= b)
                  for a, b, c in zip(df['marks'], df['min_subjects'], df['min_marks'])]

print (df)
  class name             marks  min_marks  min_subjects  flag
0     I  tom  [89, 85, 80, 74]         80             2     1
1    II  sam  [65, 72, 43, 40]         85             1     0

To avoid the for loop and make full use of parallel computations you can use the new function explode (Pandas 0.25.0):

df1 = df.explode('marks')
print(df1)

Output:

  class name marks  min_marks  min_subjects
0     I  tom    89         80             2
0     I  tom    85         80             2
0     I  tom    80         80             2
0     I  tom    74         80             2
1    II  sam    65         85             1
1    II  sam    72         85             1
1    II  sam    43         85             1
1    II  sam    40         85             1

Compare the columns marks and min_marks:

df['flag'] = df1['marks'].gt(df1['min_marks'])\
.groupby(df1.index).sum().ge(df['min_subjects']).astype(int)

print(df)

Output:

  class name             marks  min_marks  min_subjects  flag
0     I  tom  [89, 85, 80, 74]         80             2     1
1    II  sam  [65, 72, 43, 40]         85             1     0

Pandas : filter the rows based on a column containing lists

Tags:

Mahamutha M

People also ask

2 Answers

jezrael

Mykola Zotko

Recent Activity

Donate For Us

Pandas : filter the rows based on a column containing lists

Tags:

Mahamutha M

People also ask

2 Answers

jezrael

Mykola Zotko

Related questions

Recent Activity

Donate For Us