Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fill dataframe nan values from a join

Tags:

python

pandas

I am trying to map owners to an IP address through the use of two tables, df1 & df2. df1 contains the IP list to be mapped and df2 contains an IP, an alias, and the owner. After running a join on the IP column, it gives me a half joined dataframe. Most of the remaining data can be joined by replacing the NaN values with a join on the Alias column, but I can’t figure out how to do it.

My initial thoughts were to try nesting pd.merge inside fillna(), but it won't accept a dataframe. Any help would be greatly appreciated.

df1 = pd.DataFrame({'IP' : ['192.18.0.100', '192.18.0.101', '192.18.0.102', '192.18.0.103', '192.18.0.104']})
df2 = pd.DataFrame({'IP' : ['192.18.0.100', '192.18.0.101', '192.18.1.206', '192.18.1.218', '192.18.1.118'],
                    'Alias' : ['192.18.1.214', '192.18.1.243', '192.18.0.102', '192.18.0.103', '192.18.1.180'],
                    'Owner' : ['Smith, Jim', 'Bates, Andrew', 'Kline, Jenny', 'Hale, Fred', 'Harris, Robert']})
new_df = pd.DataFrame(pd.merge(df1, df2[['IP', 'Owner']], on='IP', how= 'left'))

Expected output is:

IP            Owner
192.18.0.100  Smith, Jim
192.18.0.101  Bates, Andrew
192.18.0.102  Kline, Jenny
192.18.0.103  Hale, Fred
192.18.0.104  nan
like image 919
Trace R. Avatar asked Sep 05 '19 18:09

Trace R.


3 Answers

No need to merge, Just pull data where condition satisfies. This is way faster than merge and less complicated.

condition = (df1['IP'] == df2['IP']) | (df1['IP'] == df2['Alias'])
df1['Owner'] = np.where(condition, df2['Owner'], np.nan)
print(df1)
             IP          Owner
0  192.18.0.100     Smith, Jim
1  192.18.0.101  Bates, Andrew
2  192.18.0.102   Kline, Jenny
3  192.18.0.103     Hale, Fred
4  192.18.0.104            NaN
like image 167
Vishnudev Avatar answered Oct 07 '22 08:10

Vishnudev


Try this one:

new_df = pd.DataFrame(pd.merge(df1, pd.concat([df2[['IP', 'Owner']], df2[['Alias', 'Owner']].rename(columns={"Alias": "IP"})]).drop_duplicates(), on='IP', how= 'left'))

The result:

>>> new_df
             IP          Owner
0  192.18.0.100     Smith, Jim
1  192.18.0.101  Bates, Andrew
2  192.18.0.102   Kline, Jenny
3  192.18.0.103     Hale, Fred
4  192.18.0.104            NaN
like image 26
Grzegorz Skibinski Avatar answered Oct 07 '22 08:10

Grzegorz Skibinski


Let's melt then use map:

df1['IP'].map(df2.melt('Owner').set_index('value')['Owner'])

Output:

0       Smith, Jim
1    Bates, Andrew
2     Kline, Jenny
3       Hale, Fred
4              NaN
Name: IP, dtype: object
like image 39
Scott Boston Avatar answered Oct 07 '22 09:10

Scott Boston