In the below example, I can get the merge to run correctly, but how do I not have the second index print as well? Do I have to add a separate line of code:
df_merge = df_merge.drop(columns='cities')
Can't I choose which columns I want to merge into the left dataset? What if df2 had 30 columns and I only want 10 of them?
import pandas as pd
df1 = pd.DataFrame({
"city": ['new york','chicago', 'orlando','ottawa'],
"humidity": [35,69,79,99]
})
df2 = pd.DataFrame({
"cities": ['new york', 'chicago', 'toronto'],
"temp": [1, 6, -35]
})
df_merge = df1.merge(df2, left_on='city', right_on='cities', how='left')
print(df_merge)
**output**
index city humidity cities temp
0 0 new york 35 new york 1.0
1 1 chicago 69 chicago 6.0
2 2 orlando 79 NaN NaN
3 3 ottawa 99 NaN NaN
The most straightforward way to drop a Pandas dataframe index is to use the Pandas . reset_index() method. By default, the method will only reset the index, forcing values from 0 - len(df)-1 as the index. The method will also simply insert the dataframe index into a column in the dataframe.
We can remove the index column in existing dataframe by using reset_index() function. This function will reset the index and assign the index columns start with 0 to n-1. where n is the number of rows in the dataframe.
drop() function return Series with specified index labels removed. It remove elements of a Series based on specifying the index labels.
You can make a copy of index on left dataframe and do merge. I found this simple method very useful while working with large dataframe and using pd. merge_asof() (or dd. merge_asof() ).
merge
Change the name of the column first
df1.merge(df2.rename(columns={'cities': 'city'}), 'left')
city humidity temp
0 new york 35 1.0
1 chicago 69 6.0
2 orlando 79 NaN
3 ottawa 99 NaN
If you need to explicitly state what you're merging on:
df1.merge(df2.rename(columns={'cities': 'city'}), how='left', on='city')
join
set the index of the right side first'left'
is default.
df1.join(df2.set_index('cities'), 'city')
city humidity temp
0 new york 35 1.0
1 chicago 69 6.0
2 orlando 79 NaN
3 ottawa 99 NaN
map
Make a dictionary.
df1.assign(temp=df1.city.map(dict(df2.values)))
city humidity temp
0 new york 35 1.0
1 chicago 69 6.0
2 orlando 79 NaN
3 ottawa 99 NaN
Less cute, more explicit
df1.assign(temp=df1.city.map(dict(df2.set_index('cities').temp)))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With