Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to use str.contains() with multiple expressions, in pandas dataframes?

I'm wondering if there is a more efficient way to use the str.contains() function in Pandas, to search for two partial strings at once. I want to search a given column in a dataframe for data that contains either "nt" or "nv". Right now, my code looks like this:

    df[df['Behavior'].str.contains("nt", na=False)]     df[df['Behavior'].str.contains("nv", na=False)] 

And then I append one result to another. What I'd like to do is use a single line of code to search for any data that includes "nt" OR "nv" OR "nf." I've played around with some ways that I thought should work, including just sticking a pipe between terms, but all of these result in errors. I've checked the documentation, but I don't see this as an option. I get errors like this:

    ---------------------------------------------------------------------------     TypeError                                 Traceback (most recent call last)     <ipython-input-113-1d11e906812c> in <module>()     3      4      ----> 5 soctol = f_recs[f_recs['Behavior'].str.contains("nt"|"nv", na=False)]     6 soctol      TypeError: unsupported operand type(s) for |: 'str' and 'str' 

Is there a fast way to do this? Thanks for any help, I am a beginner but am LOVING pandas for data wrangling.

like image 496
M.A.Kline Avatar asked Oct 03 '13 21:10

M.A.Kline


People also ask

How do I use contains in Pandas Python?

contains() function is used to test if pattern or regex is contained within a string of a Series or Index. The function returns boolean Series or Index based on whether a given pattern or regex is contained within a string of a Series or Index. Parameter : pat : Character sequence or regular expression.

Can a DataFrame contains multiple series?

You can create a DataFrame from multiple Series objects by adding each series as a columns. By using concat() method you can merge multiple series together into DataFrame.

How do you check if a string contains a substring in Pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not. A basic application of contains should look like Series. str. contains("substring") .

Is there a Contains () function in Python?

Python string __contains__() is an instance method and returns boolean value True or False depending on whether the string object contains the specified string object or not. Note that the Python string contains() method is case sensitive.


1 Answers

They should be one regular expression, and should be in one string:

"nt|nv"  # rather than "nt" | " nv" f_recs[f_recs['Behavior'].str.contains("nt|nv", na=False)] 

Python doesn't let you use the or (|) operator on strings:

In [1]: "nt" | "nv" TypeError: unsupported operand type(s) for |: 'str' and 'str' 
like image 55
Andy Hayden Avatar answered Oct 25 '22 14:10

Andy Hayden