I've got a pandas DataFrame that looks like this:
molecule species 0 a [dog] 1 b [horse, pig] 2 c [cat, dog] 3 d [cat, horse, pig] 4 e [chicken, pig]
and I like to extract a DataFrame containing only thoses rows, that contain any of selection = ['cat', 'dog']
. So the result should look like this:
molecule species 0 a [dog] 1 c [cat, dog] 2 d [cat, horse, pig]
What would be the simplest way to do this?
For testing:
selection = ['cat', 'dog'] df = pd.DataFrame({'molecule': ['a','b','c','d','e'], 'species' : [['dog'], ['horse','pig'],['cat', 'dog'], ['cat','horse','pig'], ['chicken','pig']]})
You can select rows from a list of Index in pandas DataFrame either using DataFrame. iloc[] , DataFrame. loc[df. index[]] .
IIUC Re-create your df then using isin
with any
should be faster than apply
df[pd.DataFrame(df.species.tolist()).isin(selection).any(1).values] Out[64]: molecule species 0 a [dog] 2 c [cat, dog] 3 d [cat, horse, pig]
You can use mask
with apply
here.
selection = ['cat', 'dog'] mask = df.species.apply(lambda x: any(item for item in selection if item in x)) df1 = df[mask]
For the DataFrame you've provided as an example above, df1 will be:
molecule species 0 a [dog] 2 c [cat, dog] 3 d [cat, horse, pig]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With