Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search for String in all Pandas DataFrame columns and filter

Tags:

python

pandas

Thought this would be straight forward but had some trouble tracking down an elegant way to search all columns in a dataframe at same time for a partial string match. Basically how would I apply df['col1'].str.contains('^') to an entire dataframe at once and filter down to any rows that have records containing the match?

like image 439
horatio1701d Avatar asked Oct 29 '14 20:10

horatio1701d


People also ask

How do you filter a DataFrame based on a column value string?

Filter rows that match a given String in a column We will use the Series. isin([list_of_values] ) function from Pandas which returns a 'mask' of True for every element in the column that exactly matches or False if it does not match any of the list values in the isin() function.

How do you check if a pandas column contains a string?

Method 1: Use isin() function In this scenario, the isin() function check the pandas column containing the string present in the list and return the column values when present, otherwise it will not select the dataframe columns.


1 Answers

The Series.str.contains method expects a regex pattern (by default), not a literal string. Therefore str.contains("^") matches the beginning of any string. Since every string has a beginning, everything matches. Instead use str.contains("\^") to match the literal ^ character.

To check every column, you could use for col in df to iterate through the column names, and then call str.contains on each column:

mask = np.column_stack([df[col].str.contains(r"\^", na=False) for col in df]) df.loc[mask.any(axis=1)] 

Alternatively, you could pass regex=False to str.contains to make the test use the Python in operator; but (in general) using regex is faster.

like image 170
unutbu Avatar answered Sep 21 '22 11:09

unutbu