Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

adding values in new column based on indexes with pandas in python

Tags:

python

pandas

I'm just getting into pandas and I am trying to add a new column to an existing dataframe.

I have two dataframes where the index of one data frame links to a column in another dataframe. Where these values are equal I need to put the value of another column in the source dataframe in a new column of the destination column.

The code section below illustrates what I mean. The commented part is what I need as an output.

I guess I need the .loc[] function.

Another, minor, question: is it bad practice to have a non-unique indexes?

import pandas as pd

d = {'key':['a',  'b', 'c'], 
     'bar':[1, 2, 3]}

d2 = {'key':['a', 'a', 'b'],
      'other_data':['10', '20', '30']}

df = pd.DataFrame(d)
df2 = pd.DataFrame(data = d2)
df2 = df2.set_index('key')

print df2

##    other_data  new_col
##key           
##a            10   1
##a            20   1
##b            30   2
like image 268
ArnJac Avatar asked Dec 06 '22 13:12

ArnJac


1 Answers

Use rename index by Series:

df2['new'] = df2.rename(index=df.set_index('key')['bar']).index
print (df2)

    other_data  new
key                
a           10    1
a           20    1
b           30    2

Or map:

df2['new'] = df2.index.to_series().map(df.set_index('key')['bar'])
print (df2)

    other_data  new
key                
a           10    1
a           20    1
b           30    2

If want better performance, the best is avoid duplicates in index. Also some function like reindex failed in duplicates index.

like image 143
jezrael Avatar answered May 24 '23 05:05

jezrael