I have two separate dataframes that share a project number. In type_df
, the project number is the index. In time_df
, the project number is a column. I would like to count the number of rows in type_df
that have a Project Type
of 2
. I am trying to do this with pandas.merge()
. It works great when using both columns, but not indices. I'm not sure how to reference the index and if merge
is even the right way to do this.
import pandas as pd type_df = pd.DataFrame(data = [['Type 1'], ['Type 2']], columns=['Project Type'], index=['Project2', 'Project1']) time_df = pd.DataFrame(data = [['Project1', 13], ['Project1', 12], ['Project2', 41]], columns=['Project', 'Time']) merged = pd.merge(time_df,type_df, on=[index,'Project']) print merged[merged['Project Type'] == 'Type 2']['Project Type'].count()
Error:
Name 'Index' is not defined.
Desired Output:
2
The merge() function is used to merge DataFrame or named Series objects with a database-style join. The join is done on columns or indexes. If joining columns on columns, the DataFrame indexes will be ignored. Otherwise if joining indexes on indexes or indexes on a column or columns, the index will be passed on.
You can use pandas. merge() to merge DataFrames by matching their index. When merging two DataFrames on the index, the value of left_index and right_index parameters of merge() function should be True .
To merge two pandas DataFrames on multiple columns use pandas. merge() method. merge() is considered more versatile and flexible and we also have the same method in DataFrame.
The main difference between merge & concat is that merge allow you to perform more structured "join" of tables where use of concat is more broad and less structured.
If you want to use an index in your merge you have to specify left_index=True
or right_index=True
, and then use left_on
or right_on
. For you it should look something like this:
merged = pd.merge(type_df, time_df, left_index=True, right_on='Project')
Another solution is use DataFrame.join
:
df3 = type_df.join(time_df, on='Project')
For version pandas 0.23.0+
the on
, left_on
, and right_on
parameters may now refer to either column names or index level names:
left_index = pd.Index(['K0', 'K0', 'K1', 'K2'], name='key1') left = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3'], 'B': ['B0', 'B1', 'B2', 'B3'], 'key2': ['K0', 'K1', 'K0', 'K1']}, index=left_index) right_index = pd.Index(['K0', 'K1', 'K2', 'K2'], name='key1') right = pd.DataFrame({'C': ['C0', 'C1', 'C2', 'C3'], 'D': ['D0', 'D1', 'D2', 'D3'], 'key2': ['K0', 'K0', 'K0', 'K1']}, index=right_index) print (left) A B key2 key1 K0 A0 B0 K0 K0 A1 B1 K1 K1 A2 B2 K0 K2 A3 B3 K1 print (right) C D key2 key1 K0 C0 D0 K0 K1 C1 D1 K0 K2 C2 D2 K0 K2 C3 D3 K1
df = left.merge(right, on=['key1', 'key2']) print (df) A B key2 C D key1 K0 A0 B0 K0 C0 D0 K1 A2 B2 K0 C1 D1 K2 A3 B3 K1 C3 D3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With