Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Select column values satisfying multiple conditions in other columns

I have a pandas dataframe such as:

    Species     Pathway     Number of Gene Families
1    uniSU2     ABC           1.0
2    uniSU2     Wzy           11.0
3    uniSU2     Synthase      2.0
4    n116       Wzy           0.0   
5    n116       ABC           4.0
7    n116       Synthase      14.0
8    Aullax     ABC           9.0
9    Aulax      Synthase      1.0
10   Aullax     Wzy           2.0
11   Criepi     Wzy           0.0
12   Criepi     ABC           2.0
13   Criepi     Synthase      3.0

I want to select the Species (1st column) that have all the three possible pathways - ABC, Wzy, Synthase (2nd column). For this, the Number of Gene Families (3rd column) would have to be a positive number (>0) for all the three pathways - ABC > 0; Wzy > 0 and Synthase > 0.

The results for this subset of my dataframe would be:

Species 
uniSU2
Aullax

I think this gets me halfway:

geneCount_stacked.loc[geneCount_stacked['Number of Gene Families'] > 0, ['Species','Pathway']] 

But I can't workout how to move forward from here.

Many thanks in advance!

like image 640
Zez Avatar asked Sep 14 '25 11:09

Zez


1 Answers

Try this:

res = pd.DataFrame({'Species': [x for x, y in df.groupby('Species') if  len({'ABC', 'Wzy', 'Synthase'} & set(y.Pathway)) == 3  and all(y['Number of Gene Families'] > 0)]})

Output

  Species
0  Aullax
1  uniSU2
like image 187
deadshot Avatar answered Sep 17 '25 01:09

deadshot