I am looking for an elegant way to append all the rows from one DataFrame to another DataFrame (both DataFrames having the same index and column structure), but in cases where the same index value appears in both DataFrames, use the row from the second data frame. So, for example, if I start with: <pre class="prettyprint"><code>df1: A B date '2015-10-01' 'A1' 'B1' '2015-10-02' 'A2' 'B2' '2015-10-03' 'A3' 'B3' df2: date A B '2015-10-02' 'a1' 'b1' '2015-10-03' 'a2' 'b2' '2015-10-04' 'a3' 'b3' </code></pre> I would like the result to be: <pre class="prettyprint"><code> A B date '2015-10-01' 'A1' 'B1' '2015-10-02' 'a1' 'b1' '2015-10-03' 'a2' 'b2' '2015-10-04' 'a3' 'b3' </code></pre> This is analogous to what I think is called "upsert" in some SQL systems --- a combination of update and insert, in the sense that each row from <code>df2</code> is either (a) used to update an existing row in <code>df1</code> if the row key already exists in <code>df1</code>, or (b) inserted into <code>df1</code> at the end if the row key does not already exist. I have come up with the following <pre class="prettyprint"><code>pd.concat([df1, df2]) # concat the two DataFrames .reset_index() # turn 'date' into a regular column .groupby('date') # group rows by values in the 'date' column .tail(1) # take the last row in each group .set_index('date') # restore 'date' as the index </code></pre> which seems to work, but this relies on the order of the rows in each groupby group always being the same as the original DataFrames, which I haven't checked on, and seems displeasingly convoluted. Does anyone have any ideas for a more straightforward solution?

One solution is to conatenate <code>df1</code> with new rows in <code>df2</code> (i.e. where the index does not match). Then update the values with those from <code>df2</code>. <pre class="prettyprint"><code>df = pd.concat([df1, df2[~df2.index.isin(df1.index)]]) df.update(df2) >>> df A B 2015-10-01 A1 B1 2015-10-02 a1 b1 2015-10-03 a2 b2 2015-10-04 a3 b3 </code></pre> EDIT: Per the suggestion of @chrisb, this can further be simplified as follows: <pre class="prettyprint"><code>pd.concat([df1[~df1.index.isin(df2.index)], df2]) </code></pre> Thanks Chris!

pandas DataFrame concat / update ("upsert")?

Tags:

python

pandas

I am looking for an elegant way to append all the rows from one DataFrame to another DataFrame (both DataFrames having the same index and column structure), but in cases where the same index value appears in both DataFrames, use the row from the second data frame.

So, for example, if I start with:

df1:                     A      B     date     '2015-10-01'  'A1'   'B1'     '2015-10-02'  'A2'   'B2'     '2015-10-03'  'A3'   'B3'  df2:     date            A      B     '2015-10-02'  'a1'   'b1'     '2015-10-03'  'a2'   'b2'     '2015-10-04'  'a3'   'b3'

I would like the result to be:

                    A      B     date     '2015-10-01'  'A1'   'B1'     '2015-10-02'  'a1'   'b1'     '2015-10-03'  'a2'   'b2'     '2015-10-04'  'a3'   'b3'

This is analogous to what I think is called "upsert" in some SQL systems --- a combination of update and insert, in the sense that each row from df2 is either (a) used to update an existing row in df1 if the row key already exists in df1, or (b) inserted into df1 at the end if the row key does not already exist.

I have come up with the following

pd.concat([df1, df2])     # concat the two DataFrames     .reset_index()        # turn 'date' into a regular column     .groupby('date')      # group rows by values in the 'date' column     .tail(1)              # take the last row in each group     .set_index('date')    # restore 'date' as the index

which seems to work, but this relies on the order of the rows in each groupby group always being the same as the original DataFrames, which I haven't checked on, and seems displeasingly convoluted.

Does anyone have any ideas for a more straightforward solution?

635

asked Oct 07 '15 20:10

embeepea

1 Answers

One solution is to conatenate df1 with new rows in df2 (i.e. where the index does not match). Then update the values with those from df2.

df = pd.concat([df1, df2[~df2.index.isin(df1.index)]]) df.update(df2)  >>> df              A   B 2015-10-01  A1  B1 2015-10-02  a1  b1 2015-10-03  a2  b2 2015-10-04  a3  b3

EDIT: Per the suggestion of @chrisb, this can further be simplified as follows:

pd.concat([df1[~df1.index.isin(df2.index)], df2])

Thanks Chris!

102

answered Sep 21 '22 03:09

Alexander

Related questions
                            
                                Bottle web framework - How to stop?
                            
                                Python Logging to Tkinter Text Widget
                            
                                3d Numpy array to 2d
                            
                                Is python Queue.queue get and put thread safe?
                            
                                How to download and write a file from Github using Requests
                            
                                OLS Regression: Scikit vs. Statsmodels? [closed]
                            
                                Passing arguments to superclass constructor without repeating them in childclass constructor
                            
                                Open IPython notebooks (*.ipynb) in read-only view (like a HTML file)
                            
                                Tensorflow : What is the relationship between .ckpt file and .ckpt.meta and .ckpt.index , and .pb file
                            
                                Converting a series of ints to strings - Why is apply much faster than astype?
                            
                                Get kwargs Inside Function
                            
                                Pipe raw OpenCV images to FFmpeg
                            
                                How to pass arguments to the __code__ of a function?
                            
                                How to define two relationships to the same table in SQLAlchemy
                            
                                How I can make apt-get install to my virtualenv?
                            
                                Why 0 ** 0 equals 1 in python
                            
                                Python split for lists
                            
                                calculate turning points / pivot points in trajectory (path)
                            
                                'ImportError: No module named pytz' when trying to import pylab?
                            
                                TypeError: coercing to Unicode: need string or buffer, int found

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With