Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas: fillna with data from another dataframe, based on the same ID

df1 has missing values:

df1=

    ID age 
    1  12 
    2  na
    3  23
    4  na
    5  na
    6  na 

and I have another df:

df2=

    ID age
    2   4
    4   5
    5   6 
    6   7

I want to fillna of df1, using df2, based on same ID:

df1 (after fillna)=

    ID age 
    1  12 
    2  4
    3  23
    4  5
    5  6
    6  7
like image 219
Robin1988 Avatar asked Jan 20 '17 22:01

Robin1988


2 Answers

You can set ID as index for both dataframes, and then use the fillna() method, which fill missing values, while matching the index of the two dataframes:

df1.set_index("ID").age.fillna(df2.set_index("ID").age).reset_index()

#  ID   age
#0  1   12
#1  2   4
#2  3   23
#3  4   5
#4  5   6
#5  6   7

Another option is, combine_first, which takes values from the first dataframe, if not null, otherwise takes values from the second dataframe with index and columns matched:

df1.set_index("ID").combine_first(df2.set_index("ID")).reset_index()

#  ID   age
#0  1   12.0
#1  2   4.0
#2  3   23.0
#3  4   5.0
#4  5   6.0
#5  6   7.0
like image 64
Psidom Avatar answered Oct 06 '22 08:10

Psidom


Check where the nulls are and then impute the value at those places.

miss_bool = df1.age.isnull() 
df2 = df2.set_index('Id')

df1.loc[miss_bool, 'age'] = df1.loc[miss_bool, 'Id'].apply(lambda x: df2.age[x])
like image 40
GpandaM Avatar answered Oct 06 '22 07:10

GpandaM