Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to merge two data frames while excluding the NaN value column?

Tags:

python

pandas

if df1 is:

       size_a  size_b
0       1       2
1       1       5
2       2       3
3       2       9
4       3       1
5       3       5
6       4       4

and df2 is:

   size_a  size_b
0     1     2
1     2     NaN
2     3     NaN

I want the result as:

  size_a size_b
0       1       2
1       2       3
2       2       9
3       3       1
4       3       5

To do the intersection I want to consider only Non-Nan values of df2- where ever there is a NaN in df2 that column value should be ignored to perform the intersection.

like image 720
javed Avatar asked Aug 07 '17 14:08

javed


2 Answers

I think you can merge them twice and concat the results:

a. Normal merge:

part1 = pd.merge(df1, df2)

b. Merge the subset of rows with NaNs:

nans = df2[df2.size_b.isnull()]
part2 = pd.merge(df1, nans[["size_a"]], on="size_a")

c. concat them

pd.concat([part1, part2], ignore_index=True)

The result:

   size_a size_b
0       1      2
1       2      3
2       2      9
3       3      1
4       3      5
like image 155
Huang Avatar answered Nov 02 '22 23:11

Huang


One way is first joining by the column(s) that require a non-wildcard join. This will help reduce the conditional filters you would have to build downstream. In the example above, I see that size_a is one of those columns:

new_df = df1.merge(df2, how='inner', on='size_a')

Next you would want to apply the filter conditions where any of the other columns has a match or where the values of those columns in df2 is NaN.

new_df = new_df[(new_df['size_b_x'] == new_df['size_b_y']) | new_df['size_b_y'].isnull()]

Finally, drop the unnecessary column(s) from df2 (denoted by _y as suffix in the column names)

new_df = new_df.drop('size_b_y', 1)
like image 26
Scratch'N'Purr Avatar answered Nov 03 '22 00:11

Scratch'N'Purr