Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging dataframes with unhashable columns

I want to merge two Pandas DataFrames. If the item code (e.g. A,B,C,D) are the same, their attributes a, b must be the same, but b is a numpy array or a list that is unhashable.

Foo:

item   a     b              
A      1     [2,0] 
B      1     [3,0]         
C      0     [4,0]         

Bar:

item   a     b
A      1     [2,0]
D      0     [6,1]

This is what I want

code   a     b        Foo   Bar
A      1     [2,0]    1     1
B      1     [3,0]    1     0
C      0     [4,0]    1     0
D      0     [6,1]    0     1
like image 869
niukasu Avatar asked Dec 01 '22 11:12

niukasu


1 Answers

You could use df.merge and df.fillna:

out = foo.assign(Foo=1).merge(bar.assign(Bar=1), 'outer').fillna(0)
print(out)

  item  a       b  Foo  Bar
0    A  1  (2, 0)  1.0  1.0
1    B  1  (3, 0)  1.0  0.0
2    C  0  (4, 0)  1.0  0.0
3    D  0  (6, 1)  0.0  1.0

If b is a list type, you could convert it to a tuple first and then merge.

foo.b = foo.b.apply(tuple)
bar.b = bar.b.apply(tuple)
out = foo.assign(Foo=1).merge(bar.assign(Bar=1), 'outer').fillna(0)
out.b = out.b.apply(list)

print(out)

  item  a       b  Foo  Bar
0    A  1  [2, 0]  1.0  1.0
1    B  1  [3, 0]  1.0  0.0
2    C  0  [4, 0]  1.0  0.0
3    D  0  [6, 1]  0.0  1.0
like image 169
cs95 Avatar answered Dec 09 '22 16:12

cs95