Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge Rows within Data Frame [duplicate]

Tags:

r

data.table

plyr

I have a relational dataset, where I'm looking for dyadic information.

I have 4 columns. Sender, Receiver, Attribute, Edge

I'm looking to take the repeated Sender -- Receiver counts and convert them as additional edges.

df <- data.frame(sender = c(1,1,1,1,3,5), receiver = c(1,2,2,2,4,5), 
                attribute = c(12,12,12,12,13,13), edge = c(0,1,1,1,1,0))

   sender receiver attribute edge
1       1        1        12    0
2       1        2        12    1
3       1        2        12    1
4       1        2        12    1
5       3        4        13    1

I want the end result to look like this:

  sender receiver attribute edge
1      1        1        12    0
2      1        2        12    3
3      3        4        13    1

Where the relationship between duplicate sender-receivers have been combined and the number of duplicates incorporated in the number of edges.

Any input would be really appreciated.

Thanks!

like image 213
crock1255 Avatar asked May 24 '12 02:05

crock1255


People also ask

How do I merge rows in a data frame?

We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.

How do I avoid duplicates in pandas merge?

To concatenate DataFrames, use the concat() method, but to ignore duplicates, use the drop_duplicates() method.


1 Answers

For fun, here are two other options, first using the base function aggregate() and the second using data.table package:

> aggregate(edge ~ sender + receiver + attribute, FUN = "sum", data = df)
  sender receiver attribute edge
1      1        1        12    0
2      1        2        12    3
3      3        4        13    1
4      5        5        13    0
> require(data.table)
> dt <- data.table(df)
> dt[, list(sumedge = sum(edge)), by = "sender, receiver, attribute"]
     sender receiver attribute sumedge
[1,]      1        1        12       0
[2,]      1        2        12       3
[3,]      3        4        13       1
[4,]      5        5        13       0

For the record, this question has been asked many many many times, perusing my own answers yields several answers that would point you down the right path.

like image 114
Chase Avatar answered Sep 19 '22 08:09

Chase