Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting dataframe by comparing multiple columns in pandas

I have a pandas dataframe and want to select the rows where some columns have some specific value. For example, for one column I tried this:

df = pd.DataFrame({
    'subA': [54,98,70,91,38],
    'subB': [25,26,30,93,30],
    'subC': [43,89,56,50,48]})


a = df[df['subA'] == 70]
print(a)

The output was as follow:

     subA  subB  subC
   2    70    30    56

This is expected and totally understandable. Now I want to select the rows where first two columns have some specific value. For example I changed the code as follow:

df = pd.DataFrame({
    'subA': [54,98,70,91,38],
    'subB': [25,26,30,93,30],
    'subC': [43,89,56,50,48]})

my_sub = ['subA', 'subB']
my_marks = [54, 25]


a = df[df[my_sub] == my_marks]
print(a)

I was expecting to see results like this:

    subA  subB  subC
 1    54    25    43

But instead the output is full of NaN values which is not clear to me:

     subA  subB  subC
0  54.0  25.0   NaN
1   NaN   NaN   NaN
2   NaN   NaN   NaN
3   NaN   NaN   NaN
4   NaN   NaN   NaN

What I am missing here to have the desired output? I also tried .loc and iloc but those did not help.

like image 661
EMT Avatar asked Dec 31 '22 21:12

EMT


1 Answers

You can use all to make it possible boolean indexing

df[(df[my_sub] == my_marks).all(axis=1)]
   subA  subB  subC
0    54    25    43

Or using eq and all as @ansev said

df[df[my_sub].eq(my_marks).all(axis=1)]
   subA  subB  subC
0    54    25    43
like image 167
Dishin H Goyani Avatar answered Jan 06 '23 02:01

Dishin H Goyani