I'm just getting into pandas and I am trying to add a new column to an existing dataframe.
I have two dataframes where the index of one data frame links to a column in another dataframe. Where these values are equal I need to put the value of another column in the source dataframe in a new column of the destination column.
The code section below illustrates what I mean. The commented part is what I need as an output.
I guess I need the .loc[]
function.
Another, minor, question: is it bad practice to have a non-unique indexes?
import pandas as pd
d = {'key':['a', 'b', 'c'],
'bar':[1, 2, 3]}
d2 = {'key':['a', 'a', 'b'],
'other_data':['10', '20', '30']}
df = pd.DataFrame(d)
df2 = pd.DataFrame(data = d2)
df2 = df2.set_index('key')
print df2
## other_data new_col
##key
##a 10 1
##a 20 1
##b 30 2
Use rename index
by Series
:
df2['new'] = df2.rename(index=df.set_index('key')['bar']).index
print (df2)
other_data new
key
a 10 1
a 20 1
b 30 2
Or map
:
df2['new'] = df2.index.to_series().map(df.set_index('key')['bar'])
print (df2)
other_data new
key
a 10 1
a 20 1
b 30 2
If want better performance, the best is avoid duplicates in index. Also some function like reindex
failed in duplicates index.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With