I don't think this exact question has been answered yet, so here goes. I have a Pandas data frame, and I want to select all rows that contain a string in column A or column B. Say the dataframe looks like this: <pre class="prettyprint"><code>d = {'id':["1", "2", "3", "4"], 'title': ["Horses are good", "Cats are bad", "Frogs are nice", "Turkeys are the best"], 'description':["Horse epitome", "Cats bad but horses good", "Frog fancier", "Turkey tome, not about horses"], 'tags':["horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey"], 'date':["2019-01-01", "2019-10-01", "2018-08-14", "2016-11-29"]} dataframe = pandas.DataFrame(d) </code></pre> Which gives: <pre class="prettyprint"><code>id title description tag date 1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01 2 "Cats are bad" "Cats bad" "horse, cat" 2019-10-01 3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14 4 "Turkey are best" "Turkey tome" "turkey, horse" 2016-11-29 </code></pre> Let's say I want to create a new dataframe containing rows with the string <code>horse</code> (ignoring capitalisation) in the column <code>title</code> OR the column <code>description</code>, but not in the column <code>tag</code> (or any other column). The result should be (row 2 and 4 get dropped): <pre class="prettyprint"><code>id title description tag date 1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01 3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14 </code></pre> I have seen a few answers for one column, such as something like: <pre class="prettyprint"><code>dataframe[dataframe['title'].str.contains('horse')] </code></pre> But I am not sure (1) how to add multiple columns to this statement and (2) how to modify it with something like <code>string.lower()</code> to remove capitals in the column values for the string match. Thanks in advance!

If want specify columns for test one possible solution is join all columns and then test with <code>Series.str.contains</code> and <code>case=False</code>: <pre class="prettyprint"><code>s = dataframe['title'] + dataframe['description'] df = dataframe[s.str.contains('horse', case=False)] </code></pre> Or create conditions for each column and chain them by bitwise <code>OR</code> with <code>|</code>: <pre class="prettyprint"><code>df = dataframe[dataframe['title'].str.contains('horse', case=False) | dataframe['description'].str.contains('horse', case=False)] </code></pre> Also if want specify column column for not test chain solution with bitwise <code>AND</code> with invert condition by <code>~</code> for <code>NOT MATCH</code>: <pre class="prettyprint"><code>df = dataframe[s.str.contains('horse', case=False) & ~dataframe['tags'].str.contains('horse', case=False)] </code></pre> For second solution add <code>()</code> around all columns with chained by <code>OR</code>: <pre class="prettyprint"><code>df = dataframe[(dataframe['title'].str.contains('horse', case=False) | dataframe['description'].str.contains('horse', case=False)) & ~dataframe['tags'].str.contains('horse', case=False)]] </code></pre> EDIT: Like @WeNYoBen commented you can add <code>DataFrame.copy</code> to end for prevent SettingWithCopyWarning like: <pre class="prettyprint"><code>s = dataframe['title'] + dataframe['description'] df = dataframe[s.str.contains('horse', case=False)].copy() </code></pre>

How to select rows in Pandas dataframe based on string matching in multiple columns

Q: How many rows are there in a Dataframe in Python?

Table 1 illustrates the output of the Python console and shows that our exemplifying data is made of six rows and three columns. This example shows how to get rows of a pandas DataFrame that have a certain value in a column of this DataFrame. In this specific example, we are selecting all rows where the column x3 is equal to the value 1.

Q: How to select rows based on multiple column conditions in Excel?

Selecting rows based on multiple column conditions using '&' operator. Code #1 : Selecting all the rows from the given dataframe in which ‘Age’ is equal to 21 and ‘Stream’ is present in the options list using basic method.

Q: How to create a Boolean filter in a pandas Dataframe?

You can write a function to be applied to each value in the States/cities column. Have the function return either True or False, and the result of applying the function can act as a Boolean filter on your DataFrame. This is a common pattern when working with pandas.

Tags:

python

pandas

dataframe

I don't think this exact question has been answered yet, so here goes.

I have a Pandas data frame, and I want to select all rows that contain a string in column A or column B.

Say the dataframe looks like this:

d = {'id':["1", "2", "3", "4"], 
     'title': ["Horses are good", "Cats are bad", "Frogs are nice", "Turkeys are the best"], 
     'description':["Horse epitome", "Cats bad but horses good", "Frog fancier", "Turkey tome, not about horses"],
     'tags':["horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey"],
     'date':["2019-01-01", "2019-10-01", "2018-08-14", "2016-11-29"]}

dataframe  = pandas.DataFrame(d)

Which gives:

id              title                      description               tag           date
1   "Horses are good"                  "Horse epitome"       "horse, cat"    2019-01-01
2      "Cats are bad"                       "Cats bad"       "horse, cat"    2019-10-01
3    "Frogs are nice"      "Frog fancier, horses good"      "horse, frog"    2018-08-14
4   "Turkey are best"                    "Turkey tome"    "turkey, horse"    2016-11-29

Let's say I want to create a new dataframe containing rows with the string horse (ignoring capitalisation) in the column title OR the column description, but not in the column tag (or any other column).

The result should be (row 2 and 4 get dropped):

id                title                     description                 tag          date  
1     "Horses are good"                  "Horse epitome"       "horse, cat"    2019-01-01
3      "Frogs are nice"      "Frog fancier, horses good"      "horse, frog"    2018-08-14

I have seen a few answers for one column, such as something like:

dataframe[dataframe['title'].str.contains('horse')]

But I am not sure (1) how to add multiple columns to this statement and (2) how to modify it with something like string.lower() to remove capitals in the column values for the string match.

Thanks in advance!

848

asked Oct 25 '19 13:10

arranjdavis

1 Answers

If want specify columns for test one possible solution is join all columns and then test with Series.str.contains and case=False:

s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)]

Or create conditions for each column and chain them by bitwise OR with |:

df = dataframe[dataframe['title'].str.contains('horse', case=False) | 
               dataframe['description'].str.contains('horse', case=False)]

Also if want specify column column for not test chain solution with bitwise AND with invert condition by ~ for NOT MATCH:

df = dataframe[s.str.contains('horse', case=False) &
               ~dataframe['tags'].str.contains('horse', case=False)]

For second solution add () around all columns with chained by OR:

df = dataframe[(dataframe['title'].str.contains('horse', case=False) | 
               dataframe['description'].str.contains('horse', case=False)) &
              ~dataframe['tags'].str.contains('horse', case=False)]]

EDIT:

Like @WeNYoBen commented you can add DataFrame.copy to end for prevent SettingWithCopyWarning like:

s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)].copy()

130

answered Nov 02 '22 19:11

jezrael

Related questions
                            
                                Do I need to commit .env files into the repository?
                            
                                Camelot is reading only the first page of the pdf
                            
                                TypeError: '<' not supported between instances of 'PrefixRecord' and 'PackageRecord' while updating Conda
                            
                                How to add a package-specific index-url to requirements.txt?
                            
                                How To Fix Miscased Procfile in Heroku
                            
                                How to join a list of multiprocessing.Process() at the same time?
                            
                                How to compute the Delta E between two images using OpenCV
                            
                                "detail": "Method \"GET\" not allowed." Django Rest Framework
                            
                                pymongo.errors.OperationFailure: command insert requires authentication
                            
                                AttributeError: 'MSVCCompiler' object has no attribute 'linker_exe'
                            
                                Python generics and subclasses
                            
                                How to open an image from an url with opencv using requests from python
                            
                                Detecting current async library
                            
                                Lambda Python to Query SSM Parameter Store Value
                            
                                How to check for new files in a folder in python
                            
                                import in python 3, explain the output please
                            
                                How can I remove a NavigableString from the tree?
                            
                                matplotlib.font_manager debug messages in log file
                            
                                Unauthorized response to POST request in Django Rest Framework with JWT Token
                            
                                ERROR: Could not find a version that satisfies the requirement tensorflow (from versions: none) ERROR: No matching distribution found for tensorflow)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With