I don't think this exact question has been answered yet, so here goes.
I have a Pandas data frame, and I want to select all rows that contain a string in column A or column B.
Say the dataframe looks like this:
d = {'id':["1", "2", "3", "4"],
'title': ["Horses are good", "Cats are bad", "Frogs are nice", "Turkeys are the best"],
'description':["Horse epitome", "Cats bad but horses good", "Frog fancier", "Turkey tome, not about horses"],
'tags':["horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey", "horse, cat, frog, turkey"],
'date':["2019-01-01", "2019-10-01", "2018-08-14", "2016-11-29"]}
dataframe = pandas.DataFrame(d)
Which gives:
id title description tag date
1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01
2 "Cats are bad" "Cats bad" "horse, cat" 2019-10-01
3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14
4 "Turkey are best" "Turkey tome" "turkey, horse" 2016-11-29
Let's say I want to create a new dataframe containing rows with the string horse
(ignoring capitalisation) in the column title
OR the column description
, but not in the column tag
(or any other column).
The result should be (row 2 and 4 get dropped):
id title description tag date
1 "Horses are good" "Horse epitome" "horse, cat" 2019-01-01
3 "Frogs are nice" "Frog fancier, horses good" "horse, frog" 2018-08-14
I have seen a few answers for one column, such as something like:
dataframe[dataframe['title'].str.contains('horse')]
But I am not sure (1) how to add multiple columns to this statement and (2) how to modify it with something like string.lower()
to remove capitals in the column values for the string match.
Thanks in advance!
Selecting rows in pandas DataFrame based on conditions Selecting rows based on particular column value using '>', '=', '=', '<=', '!=' operator. Selecting those rows whose column value is present in the list using isin() method of the dataframe. Selecting rows based on multiple column conditions using '&' operator.
Table 1 illustrates the output of the Python console and shows that our exemplifying data is made of six rows and three columns. This example shows how to get rows of a pandas DataFrame that have a certain value in a column of this DataFrame. In this specific example, we are selecting all rows where the column x3 is equal to the value 1.
Selecting rows based on multiple column conditions using '&' operator. Code #1 : Selecting all the rows from the given dataframe in which ‘Age’ is equal to 21 and ‘Stream’ is present in the options list using basic method.
You can write a function to be applied to each value in the States/cities column. Have the function return either True or False, and the result of applying the function can act as a Boolean filter on your DataFrame. This is a common pattern when working with pandas.
If want specify columns for test one possible solution is join all columns and then test with Series.str.contains
and case=False
:
s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)]
Or create conditions for each column and chain them by bitwise OR
with |
:
df = dataframe[dataframe['title'].str.contains('horse', case=False) |
dataframe['description'].str.contains('horse', case=False)]
Also if want specify column column for not test chain solution with bitwise AND
with invert condition by ~
for NOT MATCH
:
df = dataframe[s.str.contains('horse', case=False) &
~dataframe['tags'].str.contains('horse', case=False)]
For second solution add ()
around all columns with chained by OR
:
df = dataframe[(dataframe['title'].str.contains('horse', case=False) |
dataframe['description'].str.contains('horse', case=False)) &
~dataframe['tags'].str.contains('horse', case=False)]]
EDIT:
Like @WeNYoBen commented you can add DataFrame.copy
to end for prevent SettingWithCopyWarning like:
s = dataframe['title'] + dataframe['description']
df = dataframe[s.str.contains('horse', case=False)].copy()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With