I want to check every column in a dataframe whether it contains only numeric. How can i find it.
Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.
all() Default behaviour checks if column-wise values all return True. Specify axis='columns' to check if row-wise values all return True. if all values in any specific row evaluate to true then the overall row will be evaluated as true.
You can check that using to_numeric
and coercing errors:
pd.to_numeric(df['column'], errors='coerce').notnull().all()
For all columns, you can iterate through columns or just use apply
df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())
E.g.
df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'],
'col2': ['a', 10, 30, 40 ,50],
'col3': [1,2,3,4,5.0]})
Outputs
col False
col2 False
col3 True
dtype: bool
You can draw a True / False comparison using isnumeric()
>>> df
A B
0 1 1
1 NaN 6
2 NaN NaN
3 2 2
4 NaN NaN
5 4 4
6 some some
7 value other
>>> df.A.str.isnumeric()
0 True
1 NaN
2 NaN
3 True
4 NaN
5 True
6 False
7 False
Name: A, dtype: object
# df.B.str.isnumeric()
with apply()
method which seems more robust in case you need corner to corner comparison:
DataFrame having two different columns one with mixed type another with numbers only for test:
>>> df
A B
0 1 1
1 NaN 6
2 NaN 33
3 2 2
4 NaN 22
5 4 4
6 some 66
7 value 11
Result:
>>> df.apply(lambda x: x.str.isnumeric())
A B
0 True True
1 NaN True
2 NaN True
3 True True
4 NaN True
5 True True
6 False True
7 False True
Let's consider the below dataframe with different data-types as follows..
>>> df
num rating name age
0 0 80.0 shakir 33
1 1 -22.0 rafiq 37
2 2 -10.0 dev 36
3 num 1.0 suraj 30
Based on the comment from OP on this answer, where it has negative value and 0's in it.
1- This is a pseudo-internal method to return only the numeric type data.
>>> df._get_numeric_data()
rating age
0 80.0 33
1 -22.0 37
2 -10.0 36
3 1.0 30
OR
2- there is an option to use method select_dtypes
in module pandas.core.frame which return a subset of the DataFrame's columns based on the column dtypes
. One can use Parameters
with include, exclude
options.
>>> df.select_dtypes(include=['int64','float64']) # choosing int & float
rating age
0 80.0 33
1 -22.0 37
2 -10.0 36
3 1.0 30
>>> df.select_dtypes(include=['int64']) # choose int
age
0 33
1 37
2 36
3 30
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With