Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge two rows in the same Dataframe if their index is the same?

Tags:

python

pandas

I have created a large Dataframe by pulling data from an Azure database. The construction of the dataframe wasn't simple as I had to do it in parts, using the concat function to add new columns to the data set as they were pulled from the database.

This worked fine, however I am indexing by entry date and when concatenating I sometimes get two data rows with the same index. Is it possible for me to merge lines with the same index? I have searched online for solutions but I always come across examples trying to merge two separate dataframes instead of merging rows within the same dataframe.

In summary:

This

                      Col1  Col2
2015-10-27 22:22:31   1400  
2015-10-27 22:22:31         50.5

To this

                      Col1  Col2
2015-10-27 22:22:31   1400  50.5

I have tried using the groupby function on index but that just messed up. Most of the data columns disappeared and a few very large numbers were spat out.

Note:

The data is in this sort of format, except with many more columns and is generally quite sparse!

                        Col1    Col2    ...    Col_n-1 Col_n    
2015-10-27 21:15:60+0   1220        
2015-10-27 21:25:4+0    1420        
2015-10-27 21:28:8+0    1410        
2015-10-27 21:37:10+0           51.5    
2015-10-27 21:37:11+0   1500        
2015-10-27 21:46:14+0           51  
2015-10-27 21:46:15+0   1390        
2015-10-27 21:55:19+0   1370        
2015-10-27 22:04:24+0   1450        
2015-10-27 22:13:28+0   1350        
2015-10-27 22:22:31+0   1400        
2015-10-27 22:22:31+0           50.5
2015-10-27 22:25:33+0   1300        
2015-10-27 22:29:42+0                   ...    1900 
2015-10-27 22:29:42+0                                  63       
2015-10-27 22:34:36+0   1280        
like image 899
HStro Avatar asked Oct 29 '15 10:10

HStro


People also ask

How do I merge two DataFrames with the same index?

concat() to Merge Two DataFrames by Index. You can concatenate two DataFrames by using pandas. concat() method by setting axis=1 , and by default, pd. concat is a row-wise outer join.

How do I merge two rows in a data frame?

The concat() function can be used to concatenate two Dataframes by adding the rows of one to the other. The merge() function is equivalent to the SQL JOIN clause. 'left', 'right' and 'inner' joins are all possible.

How do I merge two DataFrames in Pandas with same rows?

We can use the concat function in pandas to append either columns or rows from one DataFrame to another.


2 Answers

Building up on @EdChum 's answer, it is also possible to use the min_count parameter of groupBy.sum to manage NaN values in different ways. Let's say we have an additional row to the example:

                      Col1  Col2
2015-10-27 22:22:31   1400   NaN
2015-10-27 22:22:31    NaN  50.5
2022-08-02 16:00:00   1600   NaN

then,

In [184]:
df.groupby('index').sum(min_count=1)

Out[184]:
                     Col1  Col2
index                          
2015-10-27 22:22:31  1400  50.5
2022-08-02 16:00:00  1600   NaN

Using min_count=0 will output 0 instead of NaN values.

like image 28
amauryvvk Avatar answered Nov 15 '22 00:11

amauryvvk


You can groupby on your index and call sum:

In [184]:
df.groupby(level=0).sum()

Out[184]:
                     Col1  Col2
index                          
2015-10-27 22:22:31  1400  50.5
like image 89
EdChum Avatar answered Nov 15 '22 01:11

EdChum