I have a relational dataset, where I'm looking for dyadic information.
I have 4 columns. Sender, Receiver, Attribute, Edge
I'm looking to take the repeated Sender -- Receiver counts and convert them as additional edges.
df <- data.frame(sender = c(1,1,1,1,3,5), receiver = c(1,2,2,2,4,5),
attribute = c(12,12,12,12,13,13), edge = c(0,1,1,1,1,0))
sender receiver attribute edge
1 1 1 12 0
2 1 2 12 1
3 1 2 12 1
4 1 2 12 1
5 3 4 13 1
I want the end result to look like this:
sender receiver attribute edge
1 1 1 12 0
2 1 2 12 3
3 3 4 13 1
Where the relationship between duplicate sender-receivers have been combined and the number of duplicates incorporated in the number of edges.
Any input would be really appreciated.
Thanks!
We can use the concat function in pandas to append either columns or rows from one DataFrame to another. Let's grab two subsets of our data to see how this works. When we concatenate DataFrames, we need to specify the axis. axis=0 tells pandas to stack the second DataFrame UNDER the first one.
To concatenate DataFrames, use the concat() method, but to ignore duplicates, use the drop_duplicates() method.
For fun, here are two other options, first using the base function aggregate()
and the second using data.table
package:
> aggregate(edge ~ sender + receiver + attribute, FUN = "sum", data = df)
sender receiver attribute edge
1 1 1 12 0
2 1 2 12 3
3 3 4 13 1
4 5 5 13 0
> require(data.table)
> dt <- data.table(df)
> dt[, list(sumedge = sum(edge)), by = "sender, receiver, attribute"]
sender receiver attribute sumedge
[1,] 1 1 12 0
[2,] 1 2 12 3
[3,] 3 4 13 1
[4,] 5 5 13 0
For the record, this question has been asked many many many times, perusing my own answers yields several answers that would point you down the right path.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With