Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas how to check dtype for all columns in a dataframe?

The singular form dtype is used to check the data type for a single column. And the plural form dtypes is for data frame which returns data types for all columns. Essentially:

For a single column:

dataframe.column.dtype

For all columns:

dataframe.dtypes

Example:

import pandas as pd
df = pd.DataFrame({'A': [1,2,3], 'B': [True, False, False], 'C': ['a', 'b', 'c']})

df.A.dtype
# dtype('int64')
df.B.dtype
# dtype('bool')
df.C.dtype
# dtype('O')

df.dtypes
#A     int64
#B      bool
#C    object
#dtype: object

Suppose df is a pandas DataFrame then to get number of non-null values and data types of all column at once use:

df.info()

To go one step further, I assume you want to do something with these dtypes. df.dtypes.to_dict() comes in handy.

my_type = 'float64'

dtypes = dataframe.dtypes.to_dict()

for col_nam, typ in dtypes.items():
    if (typ != my_type): #<---
        raise ValueError(f"Yikes - `dataframe['{col_name}'].dtype == {typ}` not {my_type}")

You'll find that Pandas did a really good job comparing NumPy classes and user-provided strings. For example: even things like 'double' == dataframe['col_name'].dtype will succeed when .dtype==np.float64.


If you have a lot many columns and you do df.info() or df.dtypes it may give you overall statistics of columns or just some columns from the top and bottom like

<class 'pandas.core.frame.DataFrame'>

Int64Index: 4387 entries, 1 to 4387

Columns: 119 entries, 
CoulmnA to ColumnZ

dtypes: datetime64[ns(24), 
float64(54), object(41)

memory usage: 4.0+ MB

It just gives that 24 columns are datetime, 54 are float64 and 41 are object.

So, if you want the datatype of each column in one command, do:

dict(df.dtypes)