Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Pandas inner join give ValueError: len(left_on) must equal the number of levels in the index of "right"?

I'm trying to inner join DataFrame A to DataFrame B and am running into an error.

Here's my join statement:

merged = DataFrameA.join(DataFrameB, on=['Code','Date']) 

And here's the error:

ValueError: len(left_on) must equal the number of levels in the index of "right" 

I'm not sure the column order matters (they aren't truly "ordered" are they?), but just in case, the DataFrames are organized like this:

DataFrameA:  Code, Date, ColA, ColB, ColC, ..., ColG, ColH (shape: 80514, 8 - no index) DataFrameB:  Date, Code, Col1, Col2, Col3, ..., Col15, Col16 (shape: 859, 16 - no index) 

Do I need to correct my join statement? Or is there another, better way to get the intersection (or inner join) of these two DataFrames?

like image 795
Ian Joyce Avatar asked Jan 30 '15 03:01

Ian Joyce


People also ask

How does inner join work in pandas?

Inner Join in Pandas Inner join is the most common type of join you'll be working with. It returns a dataframe with only those rows that have common characteristics. An inner join requires each row in the two joined dataframes to have matching column values. This is similar to the intersection of two sets.

What is ValueError in Python pandas?

One of the most commonly reported error in pandas is ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all() and it may sometimes be quite tricky to deal with, especially if you are new to pandas library (or even Python).

Is at and LOC same in pandas?

at is a single element and using . loc maybe a Series or a DataFrame. Returning single value is not the case always. It returns array of values if the provided index is used multiple times.

Is join and merge same in pandas?

Both join and merge can be used to combines two dataframes but the join method combines two dataframes on the basis of their indexes whereas the merge method is more versatile and allows us to specify columns beside the index to join on for both dataframes.


1 Answers

use merge if you are not joining on the index:

merged = pd.merge(DataFrameA,DataFrameB, on=['Code','Date']) 

Follow up to question below:

Here is a reproducible example:

import pandas as pd # create some timestamps for date column i = pd.to_datetime(pd.date_range('20140601',periods=2))  #create two dataframes to merge df = pd.DataFrame({'code': ['ABC','EFG'], 'date':i,'col1': [10,100]}) df2 = pd.DataFrame({'code': ['ABC','EFG'], 'date':i,'col2': [10,200]})  #merge on columns (default join is inner) pd.merge(df, df2, on =['code','date']) 

This results is:

    code    col1    date    col2 0   ABC     10      2014-06-01  10 1   EFG     100     2014-06-02  200 

What happens when you run this code?

like image 57
JAB Avatar answered Sep 19 '22 23:09

JAB