I would like to merge two Pandas dataframes together and control the names of the new column values.
I originally created the dataframes from CSV files. The original CSV files looked like this:
# presents.csv org,name,items,spend... 12A,Clerkenwell,151,435,... 12B,Liverpool Street,37,212,... ... # trees.csv org,name,items,spend... 12A,Clerkenwell,0,0,... 12B,Liverpool Street,2,92,... ...
Now I have two data frames:
df_presents = pd.read_csv(StringIO(presents_txt)) df_trees = pd.read_csv(StringIO(trees_txt))
I want to merge them together to get a final data frame, joining on the org
and name
values, and then prefixing all other columns with an appropriate prefix.
org,name,presents_items,presents_spend,trees_items,trees_spend... 12A,Clerkenwell,151,435,0,0,... 12B,Liverpool Street,37,212,2,92,...
I've been reading the documentation on merging and joining. This seems to merge correctly and result in the right number of columns:
ad = pd.DataFrame.merge(df_presents, df_trees, on=['practice', 'name'], how='outer')
But then doing print list(aggregate_data.columns.values)
shows me the following columns:
[org', u'name', u'spend_x', u'spend_y', u'items_x', u'items_y'...]
How can I rename spend_x
to be presents_spend
, etc?
Different column names are specified for merges in Pandas using the “left_on” and “right_on” parameters, instead of using only the “on” parameter. Merging dataframes with different names for the joining variable is achieved using the left_on and right_on arguments to the pandas merge function.
Consider a = pd. DataFrame({'d':[1], 'b':[2]}). rename(columns={'b':'d'}) and b=pd. DataFrame({'d':[4, 6]}) then pd.
The suffixes
option in the merge function does this. The defaults are suffixes=('_x', '_y')
.
In general, renaming columns can be done with the rename method.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With