Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find all columns of dataframe in Pandas whose type is float, or a particular type?

I have a dataframe, df, that has some columns of type float64, while the others are of object. Due to the mixed nature, I cannot use

df.fillna('unknown') #getting error "ValueError: could not convert string to float:"

as the error happened with the columns whose type is float64 (what a misleading error message!)

so I'd wish that I could do something like

for col in df.columns[<dtype == object>]:
    df[col] = df[col].fillna("unknown")

So my question is if there is any such filter expression that I can use with df.columns?

I guess alternatively, less elegantly, I could do:

 for col in df.columns:
        if (df[col].dtype == dtype('O')): # for object type
            df[col] = df[col].fillna('') 
            # still puzzled, only empty string works as replacement, 'unknown' would not work for certain value leading to error of "ValueError: Error parsing datetime string "unknown" at position 0" 

I also would like to know why in the above code replacing '' with 'unknown' the code would work for certain cells but failed with a cell with the error of "ValueError: Error parsing datetime string "unknown" at position 0"

Thanks a lot!

Yu

like image 563
Yu Shen Avatar asked Feb 12 '14 06:02

Yu Shen


People also ask

How do you check the data types of all columns in a Pandas Dataframe?

To check the data type in pandas DataFrame we can use the “dtype” attribute. The attribute returns a series with the data type of each column. And the column names of the DataFrame are represented as the index of the resultant series object and the corresponding data types are returned as values of the series object.

Which method returns a list of all columns and their data types?

Use Dataframe. dtypes to get Data types of columns in Dataframe. In Python's pandas module Dataframe class provides an attribute to get the data type information of each columns i.e. It returns a series object containing data type information of each column.

How do I consider specific columns in Pandas?

Selecting columns based on their name This is the most basic way to select a single column from a dataframe, just put the string name of the column in brackets. Returns a pandas series. Passing a list in the brackets lets you select multiple columns at the same time.


3 Answers

This is conciser:

# select the float columns df_num = df.select_dtypes(include=[np.float]) # select non-numeric columns df_num = df.select_dtypes(exclude=[np.number]) 
like image 118
RNA Avatar answered Sep 18 '22 21:09

RNA


You can see what the dtype is for all the columns using the dtypes attribute:

In [11]: df = pd.DataFrame([[1, 'a', 2.]])

In [12]: df
Out[12]: 
   0  1  2
0  1  a  2

In [13]: df.dtypes
Out[13]: 
0      int64
1     object
2    float64
dtype: object

In [14]: df.dtypes == object
Out[14]: 
0    False
1     True
2    False
dtype: bool

To access the object columns:

In [15]: df.loc[:, df.dtypes == object]
Out[15]: 
   1
0  a

I think it's most explicit to use (I'm not sure that inplace would work here):

In [16]: df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('')

Saying that, I recommend you use NaN for missing data.

like image 25
Andy Hayden Avatar answered Sep 21 '22 21:09

Andy Hayden


As @RNA said, you can use pandas.DataFrame.select_dtypes. The code using your example from a question would look like this:

for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].fillna('unknown')
like image 22
Jaroslav Bezděk Avatar answered Sep 21 '22 21:09

Jaroslav Bezděk