Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bug in pandas.DataFrame.merge?

The following:

q = pd.DataFrame([[1,2],[3,4]])
r = pd.DataFrame([[1,2],[5,6]], columns=['a','b'])
pd.merge(q, r, left_on=q.columns, right_on=r.columns, how='left')

raises an error:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

The following doesn't:

q = pd.DataFrame([[1,2],[3,4]])
r = pd.DataFrame([[1,2],[5,6]], columns=['a','b'])
pd.merge(q, r, left_on=q.columns.tolist(), right_on=r.columns.tolist(), how='left')

Is this a bug?

like image 830
user2725109 Avatar asked Dec 07 '25 16:12

user2725109


1 Answers

It depends on what is considered array-like in Pandas. It might also be a bug in documentation.

Pandas checks the type of left_on and right_on parameters (see _maybe_make_list function in pandas source), and since they are both not tuple/lists (namely, q.columns is RangeIndex and r.columns is Index), it basically does:

[q.columns] == [r.columns]

instead of comparing them directly, so that outputs the error.

Documentation says left_on: label or list, or array-like. I couldn't find a definition of array-like in Pandas, but in this case it seems to be limited to tuple or list.

like image 87
Dennis Golomazov Avatar answered Dec 10 '25 10:12

Dennis Golomazov