Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merging on non-unique column - pandas python

I have been trying to merge two DataFrames together (df and df_details) in a similar fashion to an Excel "vlookup" but am getting strange results. Below I show the structure of the two DataFrames without populating real data for simplicity

df_details:

Abstract_Title  |  Abstract_URL  |  Session_No_v2  | Session_URL | Session_ID
  -------------------------------------------------------------------------
Abstract_Title1    Abstract_URL1         1          Session_URL1     12345
Abstract_Title2    Abstract_URL2         1          Session_URL1     12345
Abstract_Title3    Abstract_URL3         1          Session_URL1     12345
Abstract_Title4    Abstract_URL4         2          Session_URL2     22222 
Abstract_Title5    Abstract_URL5         2          Session_URL2     22222
Abstract_Title6    Abstract_URL6         3          Session_URL3     98765
Abstract_Title7    Abstract_URL7         3          Session_URL3     98765

df:

Session_Title   |   Session_URL   |   Sponsors   |    Type    |   Session_ID
    -------------------------------------------------------------------------------
Session_Title1     Session_URL1        x, y z     Paper             12345
Session_Title2     Session_URL2         x, y      Presentation      22222
Session_Title3     Session_URL3        a, b ,c    Presentation      98765
Session_Title4     Session_URL4          c        Talk              12121
Session_Title5     Session_URL5         a, x      Paper             33333

I want to merge along Session_ID and I want the final DataFrame to look like: enter image description here

I've tried the following script which yields a DataFrame that duplicates (several times) certain rows and does strange things. For example, df_details has 7,046 rows and df has 1,856 rows - when I run the following merge code, my final_df results in 21,148 rows:

final_df = pd.merge(df_details, df, how = 'outer', on = 'Session_ID')

Please help!

like image 403
Alison LT Avatar asked Dec 20 '25 04:12

Alison LT


1 Answers

To generate your final output table I used the following code:

final_df = pd.merge(df_details, df[['Session_ID',
                                'Session_Title',
                                'Sponsors',
                                'Type']], left_on = ['Session_ID'], right_on = ['Session_ID'], how = 'outer')
like image 50
Brian Avatar answered Dec 21 '25 18:12

Brian