Let's say I have the following data set:
import pandas as pd
df = pd.DataFrame(
{'A': [1, 2, 3],
'B': ['one', 2, 3],
'C': [4, 5, '6Y']
})
I would like to find out - without any cumbersome for loop - which columns contain at least one case with an alphabetical letter (here: B
and C
). I guess the result should either be a list of booleans or indices.
Thank you for your help!
As a quick and simple solution, you can use replace
and filter:
df.replace('(?i)[a-z]', '', regex=True).ne(df).any()
A False
B True
C True
dtype: bool
df.columns[df.replace('(?i)[a-z]', '', regex=True).ne(df).any()]
# Index(['B', 'C'], dtype='object')
Another option is applying str.contains
column-wise:
mask = df.astype(str).apply(
lambda x: x.str.contains(r'[a-z]', flags=re.IGNORECASE)).any()
mask
A False
B True
C True
dtype: bool
df.columns[mask]
# Index(['B', 'C'], dtype='object')
We could use pd.to_numeric
:
df.apply(pd.to_numeric, errors='coerce').isna().any().tolist()
# [False, True, True]
In that case you can do with to_numeric
df.apply(pd.to_numeric,errors='coerce').isnull().any()
Out[37]:
A False
B True
C True
dtype: bool
Update
df.stack().str.contains('[a-zA-Z]').groupby(level=1).any()
Out[62]:
A False
B True
C True
dtype: bool
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With