I have a dataframe, df, that has some columns of type float64, while the others are of object. Due to the mixed nature, I cannot use <pre class="prettyprint"><code>df.fillna('unknown') #getting error "ValueError: could not convert string to float:" </code></pre> as the error happened with the columns whose type is float64 (what a misleading error message!) so I'd wish that I could do something like <pre class="prettyprint"><code>for col in df.columns[<dtype == object>]: df[col] = df[col].fillna("unknown") </code></pre> So my question is if there is any such filter expression that I can use with df.columns? I guess alternatively, less elegantly, I could do: <pre class="prettyprint"><code> for col in df.columns: if (df[col].dtype == dtype('O')): # for object type df[col] = df[col].fillna('') # still puzzled, only empty string works as replacement, 'unknown' would not work for certain value leading to error of "ValueError: Error parsing datetime string "unknown" at position 0" </code></pre> I also would like to know why in the above code replacing '' with 'unknown' the code would work for certain cells but failed with a cell with the error of "ValueError: Error parsing datetime string "unknown" at position 0" Thanks a lot! Yu

You can see what the dtype is for all the columns using the dtypes attribute: <pre class="prettyprint"><code>In [11]: df = pd.DataFrame([[1, 'a', 2.]]) In [12]: df Out[12]: 0 1 2 0 1 a 2 In [13]: df.dtypes Out[13]: 0 int64 1 object 2 float64 dtype: object In [14]: df.dtypes == object Out[14]: 0 False 1 True 2 False dtype: bool </code></pre> To access the object columns: <pre class="prettyprint"><code>In [15]: df.loc[:, df.dtypes == object] Out[15]: 1 0 a </code></pre> I think it's most explicit to use (I'm not sure that inplace would work here): <pre class="prettyprint"><code>In [16]: df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('') </code></pre> Saying that, I recommend you use NaN for missing data.

As @RNA said, you can use pandas.DataFrame.select_dtypes. The code using your example from a question would look like this: <pre class="prettyprint"><code>for col in df.select_dtypes(include=['object']).columns: df[col] = df[col].fillna('unknown') </code></pre>

Find all columns of dataframe in Pandas whose type is float, or a particular type?

Tags:

python

pandas

dataframe

data-cleaning

I have a dataframe, df, that has some columns of type float64, while the others are of object. Due to the mixed nature, I cannot use

df.fillna('unknown') #getting error "ValueError: could not convert string to float:"

as the error happened with the columns whose type is float64 (what a misleading error message!)

so I'd wish that I could do something like

for col in df.columns[<dtype == object>]:
    df[col] = df[col].fillna("unknown")

So my question is if there is any such filter expression that I can use with df.columns?

I guess alternatively, less elegantly, I could do:

 for col in df.columns:
        if (df[col].dtype == dtype('O')): # for object type
            df[col] = df[col].fillna('') 
            # still puzzled, only empty string works as replacement, 'unknown' would not work for certain value leading to error of "ValueError: Error parsing datetime string "unknown" at position 0"

I also would like to know why in the above code replacing '' with 'unknown' the code would work for certain cells but failed with a cell with the error of "ValueError: Error parsing datetime string "unknown" at position 0"

Thanks a lot!

563

asked Feb 12 '14 06:02

Yu Shen

3 Answers

This is conciser:

# select the float columns df_num = df.select_dtypes(include=[np.float]) # select non-numeric columns df_num = df.select_dtypes(exclude=[np.number])

118

answered Sep 18 '22 21:09

RNA

You can see what the dtype is for all the columns using the dtypes attribute:

In [11]: df = pd.DataFrame([[1, 'a', 2.]])

In [12]: df
Out[12]: 
   0  1  2
0  1  a  2

In [13]: df.dtypes
Out[13]: 
0      int64
1     object
2    float64
dtype: object

In [14]: df.dtypes == object
Out[14]: 
0    False
1     True
2    False
dtype: bool

To access the object columns:

In [15]: df.loc[:, df.dtypes == object]
Out[15]: 
   1
0  a

I think it's most explicit to use (I'm not sure that inplace would work here):

In [16]: df.loc[:, df.dtypes == object] = df.loc[:, df.dtypes == object].fillna('')

Saying that, I recommend you use NaN for missing data.

answered Sep 21 '22 21:09

Andy Hayden

As @RNA said, you can use pandas.DataFrame.select_dtypes. The code using your example from a question would look like this:

for col in df.select_dtypes(include=['object']).columns:
    df[col] = df[col].fillna('unknown')

answered Sep 21 '22 21:09

Jaroslav Bezděk

Related questions
                            
                                Python list comprehension - want to avoid repeated evaluation
                            
                                Why does Python 3 need dict.items to be wrapped with list()?
                            
                                Debugging Apache/Django/WSGI Bad Request (400) Error
                            
                                How to check if DynamoDB table exists?
                            
                                Pandas: ValueError: cannot convert float NaN to integer
                            
                                recover dict from 0-d numpy array
                            
                                Jinja2 template not rendering if-elif-else statement properly
                            
                                Check if dataframe column is Categorical
                            
                                Get weekday/day-of-week for Datetime column of DataFrame
                            
                                Get POSIX/Unix time in seconds and nanoseconds in Python?
                            
                                Python: Converting string into decimal number
                            
                                Multiple assignments into a python dictionary
                            
                                can you write a str.replace() using dictionary values in Python?
                            
                                jinja2 how to remove trailing newline
                            
                                Why Java and Python garbage collection methods are different?
                            
                                Error handling in SQLAlchemy
                            
                                Replace part of a string in Python?
                            
                                Python BeautifulSoup give multiple tags to findAll
                            
                                Superscript in Python plots
                            
                                Best practice in python for return value on error vs. success

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With