Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas: control new column names when merging two dataframes?

Tags:

python

pandas

I would like to merge two Pandas dataframes together and control the names of the new column values.

I originally created the dataframes from CSV files. The original CSV files looked like this:

   # presents.csv    org,name,items,spend...    12A,Clerkenwell,151,435,...    12B,Liverpool Street,37,212,...    ...    # trees.csv    org,name,items,spend...    12A,Clerkenwell,0,0,...    12B,Liverpool Street,2,92,...    ... 

Now I have two data frames:

df_presents = pd.read_csv(StringIO(presents_txt)) df_trees = pd.read_csv(StringIO(trees_txt)) 

I want to merge them together to get a final data frame, joining on the org and name values, and then prefixing all other columns with an appropriate prefix.

org,name,presents_items,presents_spend,trees_items,trees_spend... 12A,Clerkenwell,151,435,0,0,... 12B,Liverpool Street,37,212,2,92,... 

I've been reading the documentation on merging and joining. This seems to merge correctly and result in the right number of columns:

ad = pd.DataFrame.merge(df_presents, df_trees,                         on=['practice', 'name'],                         how='outer') 

But then doing print list(aggregate_data.columns.values) shows me the following columns:

[org', u'name', u'spend_x', u'spend_y', u'items_x', u'items_y'...] 

How can I rename spend_x to be presents_spend, etc?

like image 785
Richard Avatar asked Dec 17 '15 15:12

Richard


People also ask

How do I change the column names in merging pandas?

Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.

How do I merge two Dataframes with different columns in pandas?

Consider a = pd. DataFrame({'d':[1], 'b':[2]}). rename(columns={'b':'d'}) and b=pd. DataFrame({'d':[4, 6]}) then pd.


1 Answers

The suffixes option in the merge function does this. The defaults are suffixes=('_x', '_y').

In general, renaming columns can be done with the rename method.

like image 92
itzy Avatar answered Oct 02 '22 17:10

itzy