How to pass another entire column as argument to pandas fillna()

People also ask

Does Fillna fill NaN?

The fillna() function is used to fill NA/NaN values using the specified method. Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame).

You can provide this column to fillna (see docs), it will use those values on matching indexes to fill:

In [17]: df['Cat1'].fillna(df['Cat2'])
Out[17]:
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

You could do

df.Cat1 = np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)

The overall construct on the RHS uses the ternary pattern from the pandas cookbook (which it pays to read in any case). It's a vector version of a? b: c.

Just use the value parameter instead of method:

In [20]: df
Out[20]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  NaN       ant    4

In [21]: df.Cat1 = df.Cat1.fillna(value=df.Cat2)

In [22]: df
Out[22]:
  Cat1      Cat2  Day
0  cat     mouse    1
1  dog  elephant    2
2  cat     giraf    3
3  ant       ant    4

pandas.DataFrame.combine_first also works.

(Attention: since "Result index columns will be the union of the respective indexes and columns", you should check the index and columns are matched.)

import numpy as np
import pandas as pd
df = pd.DataFrame([["1","cat","mouse"],
    ["2","dog","elephant"],
    ["3","cat","giraf"],
    ["4",np.nan,"ant"]],columns=["Day","Cat1","Cat2"])

In: df["Cat1"].combine_first(df["Cat2"])
Out: 
0    cat
1    dog
2    cat
3    ant
Name: Cat1, dtype: object

Compare with other answers:

%timeit df["Cat1"].combine_first(df["Cat2"])
181 µs ± 11.3 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit df['Cat1'].fillna(df['Cat2'])
253 µs ± 10.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit np.where(df.Cat1.isnull(), df.Cat2, df.Cat1)
88.1 µs ± 793 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I didn't use this method below:

def is_missing(Cat1,Cat2):    
    if np.isnan(Cat1):        
        return Cat2
    else:
        return Cat1

df['Cat1'] = df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)

because it will raise an Exception:

TypeError: ("ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''", 'occurred at index 0')

which means np.isnan can be applied to NumPy arrays of native dtype (such as np.float64), but raises TypeError when applied to object arrays.

So I revise the method:

def is_missing(Cat1,Cat2):    
    if pd.isnull(Cat1):        
        return Cat2
    else:
        return Cat1

%timeit df.apply(lambda x: is_missing(x['Cat1'],x['Cat2']),axis=1)
701 µs ± 7.38 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Related questions
                            
                                Have the same README both in Markdown and reStructuredText
                            
                                What is the difference between size and count in pandas?
                            
                                How do I install a pip package globally instead of locally?
                            
                                How to share x axes of two subplots after they have been created
                            
                                What's the fastest way of checking if a point is inside a polygon in python
                            
                                Django: multiple models in one template using forms [closed]
                            
                                Python Pandas equivalent in JavaScript
                            
                                Python: Select subset from list based on index set
                            
                                What is the Bash equivalent of Python's pass statement
                            
                                What is the point of uWSGI?
                            
                                sphinx-build fail - autodoc can't import/find module
                            
                                Calling filter returns <filter object at ... > [duplicate]
                            
                                plot with custom text for x axis points
                            
                                The modulo operation on negative numbers in Python
                            
                                Single Line Nested For Loops
                            
                                One-liner to check whether an iterator yields at least one element?
                            
                                How do you set your pythonpath in an already-created virtualenv?
                            
                                Pandas groupby cumulative sum
                            
                                Tensorflow Strides Argument
                            
                                Where is pip cache folder?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to pass another entire column as argument to pandas fillna()

Tags:

python

pandas

fillna

People also ask

Recent Activity

Donate For Us