<p>I have created a large Dataframe by pulling data from an Azure database. The construction of the dataframe wasn't simple as I had to do it in parts, using the concat function to add new columns to the data set as they were pulled from the database.</p> <p>This worked fine, however I am indexing by entry date and when concatenating I sometimes get two data rows with the same index. Is it possible for me to merge lines with the same index? I have searched online for solutions but I always come across examples trying to merge two separate dataframes instead of merging rows within the same dataframe.</p> <h3>In summary:</h3> <h3>This</h3> <pre class="prettyprint"><code> Col1 Col2 2015-10-27 22:22:31 1400 2015-10-27 22:22:31 50.5 </code></pre> <h3>To this</h3> <pre class="prettyprint"><code> Col1 Col2 2015-10-27 22:22:31 1400 50.5 </code></pre> <p>I have tried using the groupby function on index but that just messed up. Most of the data columns disappeared and a few very large numbers were spat out.</p> <h3>Note:</h3> <p>The data is in this sort of format, except with many more columns and is generally quite sparse!</p> <pre class="prettyprint"><code> Col1 Col2 ... Col_n-1 Col_n 2015-10-27 21:15:60+0 1220 2015-10-27 21:25:4+0 1420 2015-10-27 21:28:8+0 1410 2015-10-27 21:37:10+0 51.5 2015-10-27 21:37:11+0 1500 2015-10-27 21:46:14+0 51 2015-10-27 21:46:15+0 1390 2015-10-27 21:55:19+0 1370 2015-10-27 22:04:24+0 1450 2015-10-27 22:13:28+0 1350 2015-10-27 22:22:31+0 1400 2015-10-27 22:22:31+0 50.5 2015-10-27 22:25:33+0 1300 2015-10-27 22:29:42+0 ... 1900 2015-10-27 22:29:42+0 63 2015-10-27 22:34:36+0 1280 </code></pre>

<p>You can <code>groupby</code> on your index and call <code>sum</code>:</p> <pre class="prettyprint"><code>In [184]: df.groupby(level=0).sum() Out[184]: Col1 Col2 index 2015-10-27 22:22:31 1400 50.5 </code></pre>

Merge two rows in the same Dataframe if their index is the same?

Tags:

python

pandas

I have created a large Dataframe by pulling data from an Azure database. The construction of the dataframe wasn't simple as I had to do it in parts, using the concat function to add new columns to the data set as they were pulled from the database.

This worked fine, however I am indexing by entry date and when concatenating I sometimes get two data rows with the same index. Is it possible for me to merge lines with the same index? I have searched online for solutions but I always come across examples trying to merge two separate dataframes instead of merging rows within the same dataframe.

In summary:

This

                      Col1  Col2
2015-10-27 22:22:31   1400  
2015-10-27 22:22:31         50.5

To this

                      Col1  Col2
2015-10-27 22:22:31   1400  50.5

I have tried using the groupby function on index but that just messed up. Most of the data columns disappeared and a few very large numbers were spat out.

Note:

The data is in this sort of format, except with many more columns and is generally quite sparse!

                        Col1    Col2    ...    Col_n-1 Col_n    
2015-10-27 21:15:60+0   1220        
2015-10-27 21:25:4+0    1420        
2015-10-27 21:28:8+0    1410        
2015-10-27 21:37:10+0           51.5    
2015-10-27 21:37:11+0   1500        
2015-10-27 21:46:14+0           51  
2015-10-27 21:46:15+0   1390        
2015-10-27 21:55:19+0   1370        
2015-10-27 22:04:24+0   1450        
2015-10-27 22:13:28+0   1350        
2015-10-27 22:22:31+0   1400        
2015-10-27 22:22:31+0           50.5
2015-10-27 22:25:33+0   1300        
2015-10-27 22:29:42+0                   ...    1900 
2015-10-27 22:29:42+0                                  63       
2015-10-27 22:34:36+0   1280

899

asked Oct 29 '15 10:10

HStro

2 Answers

Building up on @EdChum 's answer, it is also possible to use the min_count parameter of groupBy.sum to manage NaN values in different ways. Let's say we have an additional row to the example:

                      Col1  Col2
2015-10-27 22:22:31   1400   NaN
2015-10-27 22:22:31    NaN  50.5
2022-08-02 16:00:00   1600   NaN

then,

In [184]:
df.groupby('index').sum(min_count=1)

Out[184]:
                     Col1  Col2
index                          
2015-10-27 22:22:31  1400  50.5
2022-08-02 16:00:00  1600   NaN

Using min_count=0 will output 0 instead of NaN values.

answered Nov 15 '22 00:11

amauryvvk

You can groupby on your index and call sum:

In [184]:
df.groupby(level=0).sum()

Out[184]:
                     Col1  Col2
index                          
2015-10-27 22:22:31  1400  50.5

answered Nov 15 '22 01:11

EdChum

Related questions
                            
                                Python 2.7 Tkinter how to change text color of a button's text
                            
                                Disable SSL certificate verification in Scrapy
                            
                                Is there a way to define the docstring for a python object that defines __call__?
                            
                                Beautiful Soup: Parsing only one element
                            
                                Comparing two dataframes of different length row by row and adding columns for each row with equal value
                            
                                Zoomed inset in matplotlib without re-plotting data
                            
                                Java equivalent of Python's str.strip().split()?
                            
                                How to show image in django admin
                            
                                Python Eve, SQLalchemy and ForeignKey
                            
                                Grammar rule extraction from parsed result
                            
                                generate time series by quarter, increment by one quarter
                            
                                Python script to exe on python 3.5
                            
                                Convert categorical variable to color with Matplotlib
                            
                                Python Sorting Contents of txt file
                            
                                python invalid syntax in comment
                            
                                pandas: write df to text file - indent df to right by 5 white spaces
                            
                                how to move identical elements in numpy array into subarrays
                            
                                Permutations over subarray in python
                            
                                Why does this loop in python runs progressively slower?
                            
                                Fastest way to strip punctuation from a unicode string in Python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With