Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to get proportions and counts of a data frame in r

Tags:

r

I have a data frame like the one below, but with a lot more rows

> df<-data.frame(x1=c(1,1,0,0,1,0),x2=c("a","a","b","a","c","c"))
> df
  x1 x2
1  1  a
2  1  a
3  0  b
4  0  a
5  1  c
6  0  c

From df I want a data frame where the rows are the unique values of df$x2 and col1 is the proportion of 1s associated with each letter, and col2 is the count of each letter. So, my output would be

 > getprops(df)
  prop   count
a  .6666   3
b  0       1
c  0.5     2

I can think of some elaborate, dirty ways to do this, but I'm looking for something short and efficient. Thanks

like image 961
Ben Avatar asked Jul 07 '13 02:07

Ben


Video Answer


2 Answers

I like @RicardoSaporta's solution (+1), but you can use ?prop.table as well:

> df<-data.frame(x1=c(1,1,0,0,1,0),x2=c("a","a","b","a","c","c"))
> df
  x1 x2
1  1  a
2  1  a
3  0  b
4  0  a
5  1  c
6  0  c
> tab <- table(df$x2, df$x1)
> tab

    0 1
  a 1 2
  b 1 0
  c 1 1
> ptab <- prop.table(tab, margin=1)
> ptab

            0         1
  a 0.3333333 0.6666667
  b 1.0000000 0.0000000
  c 0.5000000 0.5000000
> dframe <- data.frame(values=rownames(tab), prop=ptab[,2], count=tab[,2])
> dframe
  values      prop count
a      a 0.6666667     2
b      b 0.0000000     0
c      c 0.5000000     1

If you'd like, you can put this together into a single function:

getprops <- function(values, indicator){
  tab    <- table(values, indicator)
  ptab   <- prop.table(tab, margin=1)
  dframe <- data.frame(values=rownames(tab), prop=ptab[,2], count=tab[,2])
  return(dframe)
}

> getprops(values=df$x2, indicator=df$x2)
  values      prop count
a      a 0.6666667     2
b      b 0.0000000     0
c      c 0.5000000     1
like image 86
gung - Reinstate Monica Avatar answered Sep 22 '22 02:09

gung - Reinstate Monica


Try installing plyr and running

library(plyr)
df <- data.frame(x1=c(1, 1, 0, 0, 1, 0),
                 label=c("a", "a", "b", "a", "c", "c"))
ddply(df, .(label), summarize, prop = mean(x1), count = length(x1))
#   label      prop count
# 1     a 0.6666667     3
# 2     b 0.0000000     1
# 3     c 0.5000000     2

which under the hood applies a split/apply/combine method similar to this in base R:

do.call(rbind, lapply(split(df, df$x2),
                            with, list(prop  = mean(x1),
                                       count = length(x1))))
like image 24
Adrian Avatar answered Sep 20 '22 02:09

Adrian