Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find columns where at least one row contains an alphabetical letter

Tags:

python

pandas

Let's say I have the following data set:

import pandas as pd

df = pd.DataFrame(
        {'A': [1, 2, 3],
         'B': ['one', 2, 3],
         'C': [4, 5, '6Y']
         })

I would like to find out - without any cumbersome for loop - which columns contain at least one case with an alphabetical letter (here: B and C). I guess the result should either be a list of booleans or indices.

Thank you for your help!

like image 946
00schneider Avatar asked Jul 01 '19 14:07

00schneider


3 Answers

As a quick and simple solution, you can use replace and filter:

df.replace('(?i)[a-z]', '', regex=True).ne(df).any()

A    False
B     True
C     True
dtype: bool

df.columns[df.replace('(?i)[a-z]', '', regex=True).ne(df).any()]
# Index(['B', 'C'], dtype='object')

Another option is applying str.contains column-wise:

mask = df.astype(str).apply(
    lambda x: x.str.contains(r'[a-z]', flags=re.IGNORECASE)).any()
mask

A    False
B     True
C     True
dtype: bool

df.columns[mask]
# Index(['B', 'C'], dtype='object')
like image 145
cs95 Avatar answered Oct 07 '22 14:10

cs95


We could use pd.to_numeric:

df.apply(pd.to_numeric, errors='coerce').isna().any().tolist()
# [False, True, True]
like image 5
yatu Avatar answered Oct 07 '22 14:10

yatu


In that case you can do with to_numeric

df.apply(pd.to_numeric,errors='coerce').isnull().any()
Out[37]: 
A    False
B     True
C     True
dtype: bool

Update

df.stack().str.contains('[a-zA-Z]').groupby(level=1).any()
Out[62]: 
A    False
B     True
C     True
dtype: bool
like image 3
BENY Avatar answered Oct 07 '22 14:10

BENY