I have a data frame like the one below, but with a lot more rows
> df<-data.frame(x1=c(1,1,0,0,1,0),x2=c("a","a","b","a","c","c"))
> df
x1 x2
1 1 a
2 1 a
3 0 b
4 0 a
5 1 c
6 0 c
From df
I want a data frame where the rows are the unique values of df$x2
and col1 is the proportion of 1s associated with each letter, and col2 is the count of each letter. So, my output would be
> getprops(df)
prop count
a .6666 3
b 0 1
c 0.5 2
I can think of some elaborate, dirty ways to do this, but I'm looking for something short and efficient. Thanks
I like @RicardoSaporta's solution (+1), but you can use ?prop.table as well:
> df<-data.frame(x1=c(1,1,0,0,1,0),x2=c("a","a","b","a","c","c"))
> df
x1 x2
1 1 a
2 1 a
3 0 b
4 0 a
5 1 c
6 0 c
> tab <- table(df$x2, df$x1)
> tab
0 1
a 1 2
b 1 0
c 1 1
> ptab <- prop.table(tab, margin=1)
> ptab
0 1
a 0.3333333 0.6666667
b 1.0000000 0.0000000
c 0.5000000 0.5000000
> dframe <- data.frame(values=rownames(tab), prop=ptab[,2], count=tab[,2])
> dframe
values prop count
a a 0.6666667 2
b b 0.0000000 0
c c 0.5000000 1
If you'd like, you can put this together into a single function:
getprops <- function(values, indicator){
tab <- table(values, indicator)
ptab <- prop.table(tab, margin=1)
dframe <- data.frame(values=rownames(tab), prop=ptab[,2], count=tab[,2])
return(dframe)
}
> getprops(values=df$x2, indicator=df$x2)
values prop count
a a 0.6666667 2
b b 0.0000000 0
c c 0.5000000 1
Try installing plyr and running
library(plyr)
df <- data.frame(x1=c(1, 1, 0, 0, 1, 0),
label=c("a", "a", "b", "a", "c", "c"))
ddply(df, .(label), summarize, prop = mean(x1), count = length(x1))
# label prop count
# 1 a 0.6666667 3
# 2 b 0.0000000 1
# 3 c 0.5000000 2
which under the hood applies a split/apply/combine method similar to this in base R:
do.call(rbind, lapply(split(df, df$x2),
with, list(prop = mean(x1),
count = length(x1))))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With