I have a dataset that looks something like this:
Type Age count1 count2 Year Pop1 Pop2 TypeDescrip
A 35 1 1 1990 30000 50000 alpha
A 35 3 1 1990 30000 50000 alpha
A 45 2 3 1990 20000 70000 alpha
B 45 2 1 1990 20000 70000 beta
B 45 4 5 1990 20000 70000 beta
I want to add the counts of the rows that are matching in the Type and Age columns. So ideally I would end up with a dataset that looks like this:
Type Age count1 count2 Year Pop1 Pop2 TypeDescrip
A 35 4 2 1990 30000 50000 alpha
A 45 2 3 1990 20000 70000 alpha
B 45 6 6 1990 20000 70000 beta
I've tried using nested duplicated()
statements such as below:
typedup = duplicated(df$Type)
bothdup = duplicated(df[(typedup == TRUE),]$Age)
but this returns indices for which age or type are duplicated, not necessarily when one row has duplicates of both.
I've also tried tapply:
tapply(c(df$count1, df$count2), c(df$Age, df$Type), sum)
but this output is difficult to work with. I want to have a data.frame when I'm done.
I don't want to use a for-loop because my dataset is quite large.
Try
library(dplyr)
df1 %>%
group_by(Type, Age) %>%
summarise_each(funs(sum))
# Type Age count1 count2
#1 A 35 4 2
#2 A 45 2 3
#3 B 45 6 6
In the newer versions of dplyr
df1 %>%
group_by(Type, Age) %>%
summarise_all(sum)
Or using base R
aggregate(.~Type+Age, df1, FUN=sum)
# Type Age count1 count2
#1 A 35 4 2
#2 A 45 2 3
#3 B 45 6 6
Or
library(data.table)
setDT(df1)[, lapply(.SD, sum), .(Type, Age)]
# Type Age count1 count2
#1: A 35 4 2
#2: A 45 2 3
#3: B 45 6 6
Based on the new dataset,
df2 %>%
group_by(Type, Age,Pop1, Pop2, TypeDescrip) %>%
summarise_each(funs(sum), matches('^count'))
# Type Age Pop1 Pop2 TypeDescrip count1 count2
#1 A 35 30000 50000 alpha 4 2
#2 A 45 20000 70000 beta 2 3
#3 B 45 20000 70000 beta 6 6
df1 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L,
35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L,
1L, 3L, 1L, 5L)), .Names = c("Type", "Age", "count1", "count2"
), class = "data.frame", row.names = c(NA, -5L))
df2 <- structure(list(Type = c("A", "A", "A", "B", "B"), Age = c(35L,
35L, 45L, 45L, 45L), count1 = c(1L, 3L, 2L, 2L, 4L), count2 = c(1L,
1L, 3L, 1L, 5L), Year = c(1990L, 1990L, 1990L, 1990L, 1990L),
Pop1 = c(30000L, 30000L, 20000L, 20000L, 20000L), Pop2 = c(50000L,
50000L, 70000L, 70000L, 70000L), TypeDescrip = c("alpha",
"alpha", "beta", "beta", "beta")), .Names = c("Type", "Age",
"count1", "count2", "Year", "Pop1", "Pop2", "TypeDescrip"),
class = "data.frame", row.names = c(NA, -5L))
@hannah you can also use sql using the sqldf package
sqldf("select
Type,Age,
sum(count1) as sum_count1,
sum(count2) as sum_count2
from
df
group by
Type,Age
")
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With