Apologies if this question has already been answered but all the info. I have been able to find is to do with merging data-frames themselves or merging in a different way. I'd really appreciate any thoughts.
I have a very large but very simple data frame with approx. 22500 rows and 48 columns. I would like to merge some of the rows within the data frame based on the row names and am wondering if there is any way to do this.
A portion of the data frame looks like this:
Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
Nasvi2EG000001t1 28 43 33 25 64
Nasvi2EG000002t2 0 3 0 0 4
Nasvi2EG000002t5 0 0 0 0 0
Nasvi2EG000002t6 0 0 0 0 0
Nasvi2EG000004t1 1 0 0 0 0
Nasvi2EG000009t1 0 4 2 0 4
Nasvi2EG000013t1 21 8 17 19 7
Nasvi2EG000014t1 0 3 0 0 4
Nasvi2EG000014t2 0 4 0 0 3
As you can see rows 2, 3 and 4 are identical in name until the digit after the "t" and same with rows 8 and 9. I'd like to merge the similarly named rows together...
What I'd like to end up with is this:
Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
Nasvi2EG000001t1 28 43 33 25 64
Nasvi2EG000002 0 3 0 0 4
Nasvi2EG000004t1 1 0 0 0 0
Nasvi2EG000009t1 0 4 2 0 4
Nasvi2EG000013t1 21 8 17 19 7
Nasvi2EG000014 0 7 0 0 7
where the values in the rows that have been merged are summed.
Would be very grateful for any thoughts.
Thanks!
Assuming your data.frame is called "SODF", create a vector from the row.names that strips out the "t+some digit" from the end of the row.names and use that as your aggregation variable.
> aggvar <- gsub("(t[0-9]+$)", "", rownames(SODF))
> aggregate(. ~ aggvar, SODF, sum)
aggvar Treatment1 Treatment2 Treatment3 Treatment4 Treatment5
1 Nasvi2EG000001 28 43 33 25 64
2 Nasvi2EG000002 0 3 0 0 4
3 Nasvi2EG000004 1 0 0 0 0
4 Nasvi2EG000009 0 4 2 0 4
5 Nasvi2EG000013 21 8 17 19 7
6 Nasvi2EG000014 0 7 0 0 7
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With