Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Filter all rows that do not contain letters (alpha) in ´pandas´

I am trying to filter a pandas dataframe using regular expressions. I want to delete those rows that do not contain any letters. For example:

Col A.
50000
$927848
dog
cat 583
rabbit 444

My desired results is:

Col A.
dog
cat 583
rabbit 444

I have been trying to solve this problem unsuccessful with regex and pandas filter options. See blow. I am specifically running into problems when I try to merge two conditions for the filter. How can I achieve this?

Option 1:

df['Col A.'] = ~df['Col A.'].filter(regex='\d+')

Option 2

df['Col A.'] = df['Col A.'].filter(regex=\w+)

Option 3

from string import digits, letters
df['Col A.'] = (df['Col A.'].filter(regex='|'.join(letters)))

OR

df['Col A.'] = ~(df['Col A.'].filter(regex='|'.join(digits)))

OR

df['Col A.'] = df[~(df['Col A.'].filter(regex='|'.join(digits))) & (df['Col A.'].filter(regex='|'.join(letters)))]
like image 573
owwoow14 Avatar asked May 02 '18 12:05

owwoow14


People also ask

How do I filter specific rows from a DataFrame pandas?

Filter Rows by Condition You can use df[df["Courses"] == 'Spark'] to filter rows by a condition in pandas DataFrame. Not that this expression returns a new DataFrame with selected rows.

How do you filter categorical data in pandas?

For categorical data you can use Pandas string functions to filter the data. The startswith() function returns rows where a given column contains values that start with a certain value, and endswith() which returns rows with values that end with a certain value.


2 Answers

I think you'd need str.contains to filter values which contain letters by the means of boolean indexing:

df =  df[df['Col A.'].str.contains('[A-Za-z]')]
print (df)
       Col A.
2         dog
3     cat 583
4  rabbit 444

If there are some NaNs values you can pass a parameter:

df = df[df['Col A.'].str.contains('[A-Za-z]', na=False)]    
print (df)
       Col A.
3         dog
4     cat 583
5  rabbit 444
like image 143
jezrael Avatar answered Sep 24 '22 04:09

jezrael


Have you tried:

df['Col A.'].filter(regex=r'\D')  # Keeps only if there's a non-digit character

or:

df['Col A.'].filter(regex=r'[A-Za-z]')  # Keeps only if there's a letter (alpha)

or:

df['Col A.'].filter(regex=r'[^\W\d_]')  # More info in the link below...

Explanation: https://stackoverflow.com/a/2039476/8933502

like image 31
Samuel GIFFARD Avatar answered Sep 22 '22 04:09

Samuel GIFFARD