I have two separate dataframes that share a project number. In type_df, the project number is the index. In time_df, the project number is a column. I would like to count the number of rows in type_df that have a Project Type of 2. I am trying to do this with pandas.merge(). It works great when using both columns, but not indices. I'm not sure how to reference the index and if merge is even the right way to do this.
import pandas as pd type_df = pd.DataFrame(data = [['Type 1'], ['Type 2']],                         columns=['Project Type'],                         index=['Project2', 'Project1']) time_df = pd.DataFrame(data = [['Project1', 13], ['Project1', 12],                                 ['Project2', 41]],                         columns=['Project', 'Time']) merged = pd.merge(time_df,type_df, on=[index,'Project']) print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()   Error:
Name 'Index' is not defined.
Desired Output:
2 
                The merge() function is used to merge DataFrame or named Series objects with a database-style join. The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
You can use pandas. merge() to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index and right_index parameters of merge() function should be True .
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
The main difference between merge & concat is that merge allow you to perform more structured "join" of tables where use of concat is more broad and less structured.
If you want to use an index in your merge you have to specify left_index=True or right_index=True, and then use left_on or right_on. For you it should look something like this:
merged = pd.merge(type_df, time_df, left_index=True, right_on='Project') 
                        Another solution is use DataFrame.join:
df3 = type_df.join(time_df, on='Project')  For version pandas 0.23.0+ the on, left_on, and right_on parameters may now refer to either column names or index level names:
left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1') left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'],                     'B': ['B0', 'B1', 'B2', 'B3'],                      'key2': ['K0', 'K1', 'K0', 'K1']},                     index=left_index)                      right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1')  right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'],                      'D': ['D0', 'D1', 'D2', 'D3'],                      'key2': ['K0', 'K0', 'K0', 'K1']},                       index=right_index)            print (left)            A   B key2 key1              K0    A0  B0   K0 K0    A1  B1   K1 K1    A2  B2   K0 K2    A3  B3   K1          print (right)        C   D key2 key1              K0    C0  D0   K0 K1    C1  D1   K0 K2    C2  D2   K0 K2    C3  D3   K1  df = left.merge(right, on=['key1', 'key2']) print (df)        A   B key2   C   D key1                      K0    A0  B0   K0  C0  D0 K1    A2  B2   K0  C1  D1 K2    A3  B3   K1  C3  D3 
                        If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With