Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas DataFrame updating Column values with other DataFrame

Tags:

pandas

Consider the following DataFrame X:

Col A Col B 
1     2
3     4
5     6

And the DataFrame Y:

Col A Col B 
3     7
8     9

Does there exist a built in function in pandas that will Combine the two dataframes, using Col A as keys and updating value in Col B if it exists, otherwise append. Such that the output of this function on X and Y is

Col A Col B
1     2
3     7
5     6
8     9

I've looked into merge and update and append but they don't seem to act the way I want, update updates by index instead of Col A value, merge doesn't overwrite, ect. Thanks!

like image 796
TheoretiCAL Avatar asked Jun 17 '13 21:06

TheoretiCAL


1 Answers

One way to do this is to concat then drop the duplicates:

In [11]: df = pd.concat([dfX, dfY])

In [12]: df
Out[12]:
   ColA  ColB
0     1     2
1     3     4
2     5     6
0     3     7
1     8     9

In [13]: df.drop_duplicates(cols=['ColA'], take_last=True)
Out[13]:
   ColA  ColB
0     1     2
2     5     6
0     3     7
1     8     9

Note: the take_last argument means you are "updating from dfY".

like image 71
Andy Hayden Avatar answered Oct 25 '22 03:10

Andy Hayden