Is there a better way to determine whether a variable in Pandas
and/or NumPy
is numeric
or not ?
I have a self defined dictionary
with dtypes
as keys and numeric
/ not
as values.
Pandas str. isdigit() method is used to check if all characters in each string in series are digits. Whitespace or any other character occurrence in the string would return false. If the number is in decimal, then also false will be returned since this is a string method and '.
To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.
In pandas 0.20.2
you can do:
import pandas as pd from pandas.api.types import is_string_dtype from pandas.api.types import is_numeric_dtype df = pd.DataFrame({'A': ['a', 'b', 'c'], 'B': [1.0, 2.0, 3.0]}) is_string_dtype(df['A']) >>>> True is_numeric_dtype(df['B']) >>>> True
You can use np.issubdtype
to check if the dtype is a sub dtype of np.number
. Examples:
np.issubdtype(arr.dtype, np.number) # where arr is a numpy array np.issubdtype(df['X'].dtype, np.number) # where df['X'] is a pandas Series
This works for numpy's dtypes but fails for pandas specific types like pd.Categorical as Thomas noted. If you are using categoricals is_numeric_dtype
function from pandas is a better alternative than np.issubdtype.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [1.0, 2.0, 3.0], 'C': [1j, 2j, 3j], 'D': ['a', 'b', 'c']}) df Out: A B C D 0 1 1.0 1j a 1 2 2.0 2j b 2 3 3.0 3j c df.dtypes Out: A int64 B float64 C complex128 D object dtype: object
np.issubdtype(df['A'].dtype, np.number) Out: True np.issubdtype(df['B'].dtype, np.number) Out: True np.issubdtype(df['C'].dtype, np.number) Out: True np.issubdtype(df['D'].dtype, np.number) Out: False
For multiple columns you can use np.vectorize:
is_number = np.vectorize(lambda x: np.issubdtype(x, np.number)) is_number(df.dtypes) Out: array([ True, True, True, False], dtype=bool)
And for selection, pandas now has select_dtypes
:
df.select_dtypes(include=[np.number]) Out: A B C 0 1 1.0 1j 1 2 2.0 2j 2 3 3.0 3j
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With