When I try to merge two dataframes by rows doing: <pre class="prettyprint"><code>bigdata = data1.append(data2) </code></pre> I get the following error: <blockquote> <pre class="prettyprint"><code>Exception: Index cannot contain duplicate values! </code></pre> </blockquote> The index of the first data frame starts from 0 to 38 and the second one from 0 to 48. I didn't understand that I have to modify the index of one of the data frame before merging, but I don't know how to. Thank you. These are the two dataframes: <code>data1</code>: <pre class="prettyprint"><code> meta particle ratio area type 0 2 part10 1.348 0.8365 touching 1 2 part18 1.558 0.8244 single 2 2 part2 1.893 0.894 single 3 2 part37 0.6695 1.005 single ....clip... 36 2 part23 1.051 0.8781 single 37 2 part3 80.54 0.9714 nuclei 38 2 part34 1.071 0.9337 single </code></pre> <code>data2</code>: <pre class="prettyprint"><code> meta particle ratio area type 0 3 part10 0.4756 1.025 single 1 3 part18 0.04387 1.232 dusts 2 3 part2 1.132 0.8927 single ...clip... 46 3 part46 13.71 1.001 nuclei 47 3 part3 0.7439 0.9038 single 48 3 part34 0.4349 0.9956 single </code></pre> the first column is the index

The <code>append</code> function has an optional argument <code>ignore_index</code> which you should use here to join the records together, since the index isn't meaningful for your application.

append two data frame with pandas

Tags:

python

pandas

When I try to merge two dataframes by rows doing:

bigdata = data1.append(data2)

I get the following error:

Exception: Index cannot contain duplicate values!

The index of the first data frame starts from 0 to 38 and the second one from 0 to 48. I didn't understand that I have to modify the index of one of the data frame before merging, but I don't know how to.

Thank you.

These are the two dataframes:

data1:

    meta  particle  ratio   area    type     0   2     part10    1.348   0.8365  touching 1   2     part18    1.558   0.8244  single   2   2     part2     1.893   0.894   single   3   2     part37    0.6695  1.005   single   ....clip... 36  2     part23    1.051   0.8781  single   37  2     part3     80.54   0.9714  nuclei   38  2     part34    1.071   0.9337  single

data2:

    meta  particle  ratio    area    type     0   3     part10    0.4756   1.025   single   1   3     part18    0.04387  1.232   dusts    2   3     part2     1.132    0.8927  single   ...clip... 46  3     part46    13.71    1.001   nuclei   47  3     part3     0.7439   0.9038  single   48  3     part34    0.4349   0.9956  single

the first column is the index

481

asked Oct 15 '11 08:10

Jean-Pat

2 Answers

The append function has an optional argument ignore_index which you should use here to join the records together, since the index isn't meaningful for your application.

165

answered Sep 27 '22 21:09

Wes McKinney

You could first identify the index-duplicated (not value) row using groupby method, and then do a sum/mean operation on all the rows with the duplicate index.

data1 = data1.groupby(data1.index).sum() data2 = data2.groupby(data2.index).sum()

answered Sep 27 '22 21:09

Madcat

Related questions
                            
                                Django Admin's "view on site" points to example.com instead of my domain
                            
                                numpy array of objects
                            
                                Most elegant way to modify elements of nested lists in place
                            
                                Combining Devanagari characters
                            
                                Parent instance is not bound to a Session; lazy load operation of attribute ’account’ cannot proceed
                            
                                Display python unittest results in nice, tabular form [closed]
                            
                                ImportError: No module named jinja2
                            
                                Why is the range object "not an iterator"? [duplicate]
                            
                                A faster alternative to Pandas `isin` function
                            
                                QLayout: Attempting to add QLayout "" to QWidget "", which already has a layout
                            
                                copy data from csv to postgresql using python
                            
                                Choosing from different cost function and activation function of a neural network
                            
                                How to use numpy in optional typing
                            
                                What does 'index 0 is out of bounds for axis 0 with size 0' mean?
                            
                                Using Smote with Gridsearchcv in Scikit-learn
                            
                                how do simple SQLAlchemy relationships work?
                            
                                Import C++ function into Python program
                            
                                Fullscreen with pyqt4?
                            
                                Reading a line from standard input in Python
                            
                                A super strange bug of os.path.abspath

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With