Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to determine whether a column/variable is numeric or not in Pandas/NumPy?

Is there a better way to determine whether a variable in Pandas and/or NumPy is numeric or not ?

I have a self defined dictionary with dtypes as keys and numeric / not as values.

like image 590
user2808117 Avatar asked Nov 11 '13 06:11

user2808117


People also ask

How do you check if a column has numeric values in pandas?

Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.

How can check column type in pandas?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.


2 Answers

In pandas 0.20.2 you can do:

import pandas as pd from pandas.api.types import is_string_dtype from pandas.api.types import is_numeric_dtype  df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})  is_string_dtype(df['A']) >>>> True  is_numeric_dtype(df['B']) >>>> True 
like image 127
danthelion Avatar answered Sep 24 '22 20:09

danthelion


You can use np.issubdtype to check if the dtype is a sub dtype of np.number. Examples:

np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series 

This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype function from pandas is a better alternative than np.issubdtype.

df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0],                     'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']}) df Out:     A    B   C  D 0  1  1.0  1j  a 1  2  2.0  2j  b 2  3  3.0  3j  c  df.dtypes Out:  A         int64 B       float64 C    complex128 D        object dtype: object 

np.issubdtype(df['A'].dtype, np.number) Out: True  np.issubdtype(df['B'].dtype, np.number) Out: True  np.issubdtype(df['C'].dtype, np.number) Out: True  np.issubdtype(df['D'].dtype, np.number) Out: False 

For multiple columns you can use np.vectorize:

is_number = np.vectorize(lambda x: np.issubdtype(x, np.number)) is_number(df.dtypes) Out: array([ True,  True,  True, False], dtype=bool) 

And for selection, pandas now has select_dtypes:

df.select_dtypes(include=[np.number]) Out:     A    B   C 0  1  1.0  1j 1  2  2.0  2j 2  3  3.0  3j 
like image 35
ayhan Avatar answered Sep 25 '22 20:09

ayhan