Is there a better way to determine whether a variable in Pandas and/or NumPy is numeric or not ? 
I have a self defined dictionary with dtypes as keys and numeric / not as values.
Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
In pandas 0.20.2 you can do:
import pandas as pd from pandas.api.types import is_string_dtype from pandas.api.types import is_numeric_dtype  df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]})  is_string_dtype(df['A']) >>>> True  is_numeric_dtype(df['B']) >>>> True 
                        You can use np.issubdtype to check if the dtype is a sub dtype of np.number. Examples:
np.issubdtype(arr.dtype, np.number)  # where arr is a numpy array np.issubdtype(df['X'].dtype, np.number)  # where df['X'] is a pandas Series   This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype function from pandas is a better alternative than np.issubdtype.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0],                     'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']}) df Out:     A    B   C  D 0  1  1.0  1j  a 1  2  2.0  2j  b 2  3  3.0  3j  c  df.dtypes Out:  A         int64 B       float64 C    complex128 D        object dtype: object   np.issubdtype(df['A'].dtype, np.number) Out: True  np.issubdtype(df['B'].dtype, np.number) Out: True  np.issubdtype(df['C'].dtype, np.number) Out: True  np.issubdtype(df['D'].dtype, np.number) Out: False   For multiple columns you can use np.vectorize:
is_number = np.vectorize(lambda x: np.issubdtype(x, np.number)) is_number(df.dtypes) Out: array([ True,  True,  True, False], dtype=bool)   And for selection, pandas now has select_dtypes:
df.select_dtypes(include=[np.number]) Out:     A    B   C 0  1  1.0  1j 1  2  2.0  2j 2  3  3.0  3j 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With