Select rows containing certain values from pandas dataframe

Tags:

I have a pandas dataframe whose entries are all strings:

   A     B      C 1 apple  banana pear 2 pear   pear   apple 3 banana pear   pear 4 apple  apple  pear

etc. I want to select all the rows that contain a certain string, say, 'banana'. I don't know which column it will appear in each time. Of course, I can write a for loop and iterate over all rows. But is there an easier or faster way to do this?

424

asked Jul 04 '16 13:07

ylangylang

1 Answers

Introduction

At the heart of selecting rows, we would need a 1D mask or a pandas-series of boolean elements of length same as length of df, let's call it mask. So, finally with df[mask], we would get the selected rows off df following boolean-indexing.

Here's our starting df :

Click to copy

In [42]: df Out[42]:          A       B      C 1   apple  banana   pear 2    pear    pear  apple 3  banana    pear   pear 4   apple   apple   pear

I. Match one string

Now, if we need to match just one string, it's straight-foward with elementwise equality :

Click to copy

In [42]: df == 'banana' Out[42]:         A      B      C 1  False   True  False 2  False  False  False 3   True  False  False 4  False  False  False

If we need to look ANY one match in each row, use .any method :

Click to copy

In [43]: (df == 'banana').any(axis=1) Out[43]:  1     True 2    False 3     True 4    False dtype: bool

To select corresponding rows :

Click to copy

In [44]: df[(df == 'banana').any(axis=1)] Out[44]:          A       B     C 1   apple  banana  pear 3  banana    pear  pear

II. Match multiple strings

1. Search for ANY match

Here's our starting df :

Click to copy

In [42]: df Out[42]:          A       B      C 1   apple  banana   pear 2    pear    pear  apple 3  banana    pear   pear 4   apple   apple   pear

NumPy's np.isin would work here (or use pandas.isin as listed in other posts) to get all matches from the list of search strings in df. So, say we are looking for 'pear' or 'apple' in df :

Click to copy

In [51]: np.isin(df, ['pear','apple']) Out[51]:  array([[ True, False,  True],        [ True,  True,  True],        [False,  True,  True],        [ True,  True,  True]])  # ANY match along each row In [52]: np.isin(df, ['pear','apple']).any(axis=1) Out[52]: array([ True,  True,  True,  True])  # Select corresponding rows with masking In [56]: df[np.isin(df, ['pear','apple']).any(axis=1)] Out[56]:          A       B      C 1   apple  banana   pear 2    pear    pear  apple 3  banana    pear   pear 4   apple   apple   pear

2. Search for ALL match

Here's our starting df again :

Click to copy

In [42]: df Out[42]:          A       B      C 1   apple  banana   pear 2    pear    pear  apple 3  banana    pear   pear 4   apple   apple   pear

So, now we are looking for rows that have BOTH say ['pear','apple']. We will make use of NumPy-broadcasting :

Click to copy

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1) Out[66]:  array([[ True,  True],        [ True,  True],        [ True, False],        [ True,  True]])

So, we have a search list of 2 items and hence we have a 2D mask with number of rows = len(df) and number of cols = number of search items. Thus, in the above result, we have the first col for 'pear' and second one for 'apple'.

To make things concrete, let's get a mask for three items ['apple','banana', 'pear'] :

Click to copy

In [62]: np.equal.outer(df.to_numpy(copy=False),  ['apple','banana', 'pear']).any(axis=1) Out[62]:  array([[ True,  True,  True],        [ True, False,  True],        [False,  True,  True],        [ True, False,  True]])

The columns of this mask are for 'apple','banana', 'pear' respectively.

Back to 2 search items case, we had earlier :

Click to copy

In [66]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1) Out[66]:  array([[ True,  True],        [ True,  True],        [ True, False],        [ True,  True]])

Since, we are looking for ALL matches in each row :

Click to copy

In [67]: np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1) Out[67]: array([ True,  True, False,  True])

Finally, select rows :

Click to copy

In [70]: df[np.equal.outer(df.to_numpy(copy=False),  ['pear','apple']).any(axis=1).all(axis=1)] Out[70]:         A       B      C 1  apple  banana   pear 2   pear    pear  apple 4  apple   apple   pear

answered Sep 24 '22 05:09

Divakar

Related questions
                            
                                SparkContext Error - File not found /tmp/spark-events does not exist
                            
                                Access appsettings.json values in controller classes
                            
                                how to check if a variable is of type enum in python
                            
                                Regex get domain name from email
                            
                                Efficiently finding the closest coordinate pair from a set in Python
                            
                                Display 2 decimal places, and use commas to separate thousands, in Jupyter/pandas?
                            
                                How to change value of a select box in angular2 unit test?
                            
                                pandas drop row based on index vs ix
                            
                                Django Webfaction 'Timeout when reading response headers from daemon process'
                            
                                ImportError: 'No module named plotly.plotly' in LinuxMint17.3
                            
                                "Back Up" not appearing in SQL Server Management Studio 2016 or 17
                            
                                How to send Authorization header in Android using Volley library?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Select rows containing certain values from pandas dataframe

Tags:

ylangylang

People also ask

1 Answers

Introduction

I. Match one string

II. Match multiple strings

Divakar

Recent Activity

Donate For Us