Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter pandas DataFrame by substring critera

I have a pandas DataFrame with a column of string values. I need to select rows based on partial string matches.

Something like this idiom:

re.search(pattern, cell_in_question)  

returning a boolean. I am familiar with the syntax of df[df['A'] == "hello world"] but can't seem to find a way to do the same with a partial string match, say 'hello'.

like image 735
euforia Avatar asked Jul 05 '12 18:07

euforia


People also ask

How do I select a substring in pandas?

Using “contains” to Find a Substring in a Pandas DataFrame The contains method in Pandas allows you to search a column for a specific substring. The contains method returns boolean values for the Series with True for if the original Series value contains the substring and False if not.


1 Answers

Based on github issue #620, it looks like you'll soon be able to do the following:

df[df['A'].str.contains("hello")] 

Update: vectorized string methods (i.e., Series.str) are available in pandas 0.8.1 and up.

like image 144
Garrett Avatar answered Oct 12 '22 23:10

Garrett