Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Pandas dataframe select rows where a list-column contains any of a list of strings

Tags:

python

pandas

dataframe

I've got a pandas DataFrame that looks like this:

  molecule            species 0        a              [dog] 1        b       [horse, pig] 2        c         [cat, dog] 3        d  [cat, horse, pig] 4        e     [chicken, pig]

and I like to extract a DataFrame containing only thoses rows, that contain any of selection = ['cat', 'dog']. So the result should look like this:

  molecule            species 0        a              [dog] 1        c         [cat, dog] 2        d  [cat, horse, pig]

What would be the simplest way to do this?

For testing:

selection = ['cat', 'dog'] df = pd.DataFrame({'molecule': ['a','b','c','d','e'], 'species' : [['dog'], ['horse','pig'],['cat', 'dog'], ['cat','horse','pig'], ['chicken','pig']]})

like image

508

asked Nov 16 '18 17:11

NicoH

People also ask

How do I select rows of pandas DataFrame based on a list?

You can select rows from a list of Index in pandas DataFrame either using DataFrame. iloc[] , DataFrame. loc[df. index[]] .

2 Answers

IIUC Re-create your df then using isin with any should be faster than apply

df[pd.DataFrame(df.species.tolist()).isin(selection).any(1).values] Out[64]:    molecule            species 0        a              [dog] 2        c         [cat, dog] 3        d  [cat, horse, pig]

like image

163

answered Sep 19 '22 14:09

BENY

You can use mask with apply here.

selection = ['cat', 'dog']  mask = df.species.apply(lambda x: any(item for item in selection if item in x)) df1 = df[mask]

For the DataFrame you've provided as an example above, df1 will be:

molecule    species 0   a   [dog] 2   c   [cat, dog] 3   d   [cat, horse, pig]

like image

42

answered Sep 20 '22 14:09

Wes Doyle

Sign in to Comment

Related questions
                            
                                ctypes loading a c shared library that has dependencies
                            
                                Exploitable Python Functions [closed]
                            
                                Overflow in exp in scipy/numpy in Python?
                            
                                Remove Max and Min values from python list of integers
                            
                                Python: How to get group ids of one username (like id -Gn )
                            
                                How to convert an image from np.uint16 to np.uint8?
                            
                                Why does json.dumps(list(np.arange(5))) fail while json.dumps(np.arange(5).tolist()) works
                            
                                How to set and get a parent class attribute from an inherited class in Python?
                            
                                Animate a rotating 3D graph in matplotlib
                            
                                Android Market API - Python ImportError: No module named google.protobuf
                            
                                Is "norm" equivalent to "Euclidean distance"?
                            
                                Impute entire DataFrame (all columns) using Scikit-learn (sklearn) without iterating over columns
                            
                                Read all but last line of CSV file in pandas
                            
                                Python dictionary doesn't have all the keys assigned, or items
                            
                                PhantomJS with Selenium error: Message: 'phantomjs' executable needs to be in PATH
                            
                                How to perform k-fold cross validation with tensorflow?
                            
                                How to turn a video into numpy array?
                            
                                How to debug a Python module in Visual Studio Code's launch.json
                            
                                How to create a udf in PySpark which returns an array of strings?
                            
                                How to get a single value as a string from pandas data frame

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With