Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Merge on a column and Index in Pandas

I have two separate dataframes that share a project number. In type_df, the project number is the index. In time_df, the project number is a column. I would like to count the number of rows in type_df that have a Project Type of 2. I am trying to do this with pandas.merge(). It works great when using both columns, but not indices. I'm not sure how to reference the index and if merge is even the right way to do this.

import pandas as pd type_df = pd.DataFrame(data = [['Type 1'], ['Type 2']],                         columns=['Project Type'],                         index=['Project2', 'Project1']) time_df = pd.DataFrame(data = [['Project1', 13], ['Project1', 12],                                 ['Project2', 41]],                         columns=['Project', 'Time']) merged = pd.merge(time_df,type_df, on=[index,'Project']) print merged[merged['Project Type'] == 'Type 2']['Project Type'].count() 

Error:

Name 'Index' is not defined.

Desired Output:

2 
like image 825
user2242044 Avatar asked Jul 21 '15 01:07

user2242044


People also ask

How do I merge index columns in pandas?

The merge() function is used to merge DataFrame or named Series objects with a database-style join. The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.

How do I merge two DataFrames with the same index?

You can use pandas. merge() to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index and right_index parameters of merge() function should be True .

Can you merge columns in pandas?

To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.

What is the difference between merge join and concatenate in pandas?

The main difference between merge & concat is that merge allow you to perform more structured "join" of tables where use of concat is more broad and less structured.


2 Answers

If you want to use an index in your merge you have to specify left_index=True or right_index=True, and then use left_on or right_on. For you it should look something like this:

merged = pd.merge(type_df, time_df, left_index=True, right_on='Project') 
like image 189
maxymoo Avatar answered Oct 16 '22 00:10

maxymoo


Another solution is use DataFrame.join:

df3 = type_df.join(time_df, on='Project') 

For version pandas 0.23.0+ the on, left_on, and right_on parameters may now refer to either column names or index level names:

left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1') left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],                     'B': ['B0', 'B1', 'B2', 'B3'],                      'key2': ['K0', 'K1', 'K0', 'K1']},                     index=left_index)                      right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')  right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],                      'D': ['D0', 'D1', 'D2', 'D3'],                      'key2': ['K0', 'K0', 'K0', 'K1']},                       index=right_index)            print (left)            A   B key2 key1              K0    A0  B0   K0 K0    A1  B1   K1 K1    A2  B2   K0 K2    A3  B3   K1          print (right)        C   D key2 key1              K0    C0  D0   K0 K1    C1  D1   K0 K2    C2  D2   K0 K2    C3  D3   K1 

df = left.merge(right, on=['key1', 'key2']) print (df)        A   B key2   C   D key1                      K0    A0  B0   K0  C0  D0 K1    A2  B2   K0  C1  D1 K2    A3  B3   K1  C3  D3 
like image 39
jezrael Avatar answered Oct 15 '22 23:10

jezrael