Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to check if a pandas dataframe contains only numeric column wise?

I want to check every column in a dataframe whether it contains only numeric. How can i find it.

like image 377
Raja Sahe S Avatar asked Jan 29 '19 17:01

Raja Sahe S


People also ask

How do you check if a column has numeric values in pandas?

Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.

How do you find the column wise information of a DataFrame?

all() Default behaviour checks if column-wise values all return True. Specify axis='columns' to check if row-wise values all return True. if all values in any specific row evaluate to true then the overall row will be evaluated as true.


2 Answers

You can check that using to_numeric and coercing errors:

pd.to_numeric(df['column'], errors='coerce').notnull().all()

For all columns, you can iterate through columns or just use apply

df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())

E.g.

df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'], 
                   'col2': ['a', 10, 30, 40 ,50],
                   'col3': [1,2,3,4,5.0]})

Outputs

col     False
col2    False
col3     True
dtype: bool
like image 125
rafaelc Avatar answered Oct 14 '22 03:10

rafaelc


You can draw a True / False comparison using isnumeric()

Example:

 >>> df
       A      B
0      1      1
1    NaN      6
2    NaN    NaN
3      2      2
4    NaN    NaN
5      4      4
6   some   some
7  value  other

Results:

>>> df.A.str.isnumeric()
0     True
1      NaN
2      NaN
3     True
4      NaN
5     True
6    False
7    False
Name: A, dtype: object

# df.B.str.isnumeric()

with apply() method which seems more robust in case you need corner to corner comparison:

DataFrame having two different columns one with mixed type another with numbers only for test:

>>> df
       A   B
0      1   1
1    NaN   6
2    NaN  33
3      2   2
4    NaN  22
5      4   4
6   some  66
7  value  11

Result:

>>> df.apply(lambda x: x.str.isnumeric())
       A     B
0   True  True
1    NaN  True
2    NaN  True
3   True  True
4    NaN  True
5   True  True
6  False  True
7  False  True

Another example:

Let's consider the below dataframe with different data-types as follows..

>>> df
   num  rating    name  age
0    0    80.0  shakir   33
1    1   -22.0   rafiq   37
2    2   -10.0     dev   36
3  num     1.0   suraj   30

Based on the comment from OP on this answer, where it has negative value and 0's in it.

1- This is a pseudo-internal method to return only the numeric type data.

>>> df._get_numeric_data()
   rating  age
0    80.0   33
1   -22.0   37
2   -10.0   36
3     1.0   30

OR

2- there is an option to use method select_dtypes in module pandas.core.frame which return a subset of the DataFrame's columns based on the column dtypes. One can use Parameters with include, exclude options.

>>> df.select_dtypes(include=['int64','float64']) # choosing int & float
   rating  age
0    80.0   33
1   -22.0   37
2   -10.0   36
3     1.0   30

>>> df.select_dtypes(include=['int64'])  # choose int
   age
0   33
1   37
2   36
3   30
like image 37
Karn Kumar Avatar answered Oct 14 '22 01:10

Karn Kumar