Elegant way to replace values in pandas.DataFrame from another DataFrame

Tags:

I have a data frame that I want to replace the values in one column, with values from another dataframe.

df = pd.DataFrame({'id1': [1001,1002,1001,1003,1004,1005,1002,1006],
                   'value1': ["a","b","c","d","e","f","g","h"],
                   'value3': ["yes","no","yes","no","no","no","yes","no"]})

dfReplace = pd.DataFrame({'id2': [1001,1002],
                   'value2': ["rep1","rep2"]})

I need to use a groupby with common key and current solution is with a loop. Is there a more elegant (faster) way to do this with .map(apply) etc. I wanted initial to use pd.update(), but doesn't seem the correct way.

groups = dfReplace.groupby(['id2'])

for key, group in groups:
    df.loc[df['id1']==key,'value1']=group['value2'].values

Output

df
    id1   value1 value3
0   1001  rep1   yes
1   1002  rep2   no
2   1001  rep1   yes
3   1003  d      no
4   1004  e      no
5   1005  f      no
6   1002  rep2   yes
7   1006  h      no

298

asked Mar 12 '16 16:03

iboboboru

2 Answers

try merge():

merge = df.merge(dfReplace, left_on='id1', right_on='id2', how='left')
print(merge)

merge.ix[(merge.id1 == merge.id2), 'value1'] = merge.value2
print(merge)

del merge['id2']
del merge['value2']
print(merge)

Output:

    id1 value1 value3   id2 value2
0  1001      a    yes  1001   rep1
1  1002      b     no  1002   rep2
2  1001      c    yes  1001   rep1
3  1003      d     no   NaN    NaN
4  1004      e     no   NaN    NaN
5  1005      f     no   NaN    NaN
6  1002      g    yes  1002   rep2
7  1006      h     no   NaN    NaN

    id1 value1 value3   id2 value2
0  1001   rep1    yes  1001   rep1
1  1002   rep2     no  1002   rep2
2  1001   rep1    yes  1001   rep1
3  1003      d     no   NaN    NaN
4  1004      e     no   NaN    NaN
5  1005      f     no   NaN    NaN
6  1002   rep2    yes  1002   rep2
7  1006      h     no   NaN    NaN

    id1 value1 value3
0  1001   rep1    yes
1  1002   rep2     no
2  1001   rep1    yes
3  1003      d     no
4  1004      e     no
5  1005      f     no
6  1002   rep2    yes
7  1006      h     no

179

answered Oct 13 '22 16:10

MaxU - stop WAR against UA

This is a little cleaner if you already have the indexes set to id, but if not you can still do in one line:

>>> (dfReplace.set_index('id2').rename( columns = {'value2':'value1'} )
                               .combine_first(df.set_index('id1')))

     value1 value3
1001   rep1    yes
1001   rep1    yes
1002   rep2     no
1002   rep2    yes
1003      d     no
1004      e     no
1005      f     no
1006      h     no

If you separate into three lines and do the renaming and re-indexing separately, you can see that the combine_first() by itself is actually very simple:

>>> df = df.set_index('id1')
>>> dfReplace = dfReplace.set_index('id2').rename( columns={'value2':'value1'} )

>>> dfReplace.combine_first(df)

answered Oct 13 '22 14:10

JohnE

Related questions
                            
                                How to clear text field part of ttk.Combobox?
                            
                                Python scikit svm "ValueError: X has 62 features per sample; expecting 337"
                            
                                how to sum across many columns with pandas groupby?
                            
                                Is there a way to sandbox test execution with pytest, especially filesystem access?
                            
                                No module named Win32com.client error when using the pyttsx package
                            
                                Pyqt - What signal does my standard "Apply" button emit and how do I write the slot for it?
                            
                                No module named win32com
                            
                                CRSF Token Interfering With TDD - Is there a variable that stores csrf output?
                            
                                How to check if a docker instance is running?
                            
                                Python itertools: Best way to unpack product of product of list of lists
                            
                                Python Networkx detecting loops/circles
                            
                                python multiply two collection counters
                            
                                Control xaxis tick mark size on all subplots
                            
                                Remove multiple values from [list] dictionary python
                            
                                Locating table with no id or class attributes
                            
                                Django ignore extra arguments on constructing model
                            
                                how to get Python XMLGenerator to output CDATA
                            
                                Replacing characters from string one to string two
                            
                                django restframework :getting NotImplementedError
                            
                                Simple Python String (Backward) Slicing

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Elegant way to replace values in pandas.DataFrame from another DataFrame

Tags:

python

pandas

apply

iboboboru

People also ask

2 Answers

MaxU - stop WAR against UA

JohnE

Recent Activity

Donate For Us