Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to do multiple queries?

I want to do a multiple queries. Here is my data frame:

data = {'Name':['Penny','Ben','Benny','Mark','Ben1','Ben2','Ben3'], 
        'Eng':[5,1,4,3,1,2,3], 
        'Math':[1,5,3,2,2,2,3],
        'Physics':[2,5,3,1,1,2,3],
        'Sports':[4,5,2,3,1,2,3],
        'Total':[12,16,12,9,5,8,12],
        'Group':['A','A','A','A','A','B','B']}

df1=pd.DataFrame(data, columns=['Name','Eng','Math','Physics','Sports','Total','Group']) 
df1

I have 3 queries:

  1. Group A or B
  2. Math > Eng
  3. Name starts with 'B'

I tried to do it one by one

df1[df1.Name.str.startswith('B')]
df1.query('Math > Eng')
df1[df1.Group == 'A'] #I cannot run the code with df1[df1.Group == 'A' or 'B']

Then, I tried to merge those queries

df1.query("'Math > Eng' & 'df1[df1.Name.str.startswith('B')]' & 'df1[df1.Group == 'A']")
TokenError: ('EOF in multi-line statement', (2, 0))

I also tried to pass str.startswith() into df.query()

df1.query("df1.Name.str.startswith('B')")
UndefinedVariableError: name 'df1' is not defined

I have tried lots of ways but no one works. How can I put those queries together?

like image 390
aukk123 Avatar asked Nov 12 '20 02:11

aukk123


2 Answers

The long way to solve this – and the one with the most transparency, so best for beginners – is to create a boolean column for each filter. Then sum those columns as one final filter:

df1['filter_1'] = df1['Group'].isin(['A','B'])
df1['filter_2'] = df1['Math'] > df1['Eng']
df1['filter_3'] = df1['Name'].str.startswith('B')

# If all are true
df1['filter_final'] = df1[['filter_1', 'filter_2', 'filter_3']].all(axis=1)

You can certainly combine these steps into one:

mask = ((df1['Group'].isin(['A','B'])) &
        (df1['Math'] > df1['Eng']) &
        (df1['Name'].str.startswith('B'))
       )

df['filter_final'] = mask

Lastly, selecting rows which satisfy your filter is done as follows:

df_filtered = df1[df1['filter_final']]

This selects rows from df1 where final_filter == True

like image 199
Yaakov Bressler Avatar answered Oct 13 '22 17:10

Yaakov Bressler


Firstly, the answer is:

df1.query("Math > Eng & Name.str.startswith('B') & Group=='A'")

Additional comments

  1. In query, the column's name doesn't accompany the data frame's name.
  2. df1[df1.Group.isin(['A', 'B'])] or df1.query("Group in ['A', 'B']") instead of df1[df1.Group == 'A' or 'B']
like image 2
tomo_iris427 Avatar answered Oct 13 '22 16:10

tomo_iris427