More efficient way to get frequency counts across columns of data frame

Tags:

dplyr

I have some survey data in which columns correspond to items and rows correspond to customers saying how likely they are to buy each item. Looks like this:

item1 = c("Likely", "Unlikely", "Very Likely","Likely") 
item2 = c("Likely", "Unlikely", "Very Likely","Unlikely")
item3 = c("Very Likely", "Unlikely", "Very Likely","Likely") 
df = data.frame(item1, item2, item3)

I want a summary table giving the percentage of each response for each item. Right now I'm using table() on each column for this process, and its a lot of code to manipulate. How can I do this using plyr or apply or something faster?

Current solution:

d1<-as.data.frame(table(df$item1))
d1$item1_percent<- d1$Freq/sum(d1$Freq)
names(d1)<-c("Response","item1_freqs","item1_percent")

d2<-as.data.frame(table(df$item2))
d2$item2_percent<- d2$Freq/sum(d2$Freq)
names(d2)<-c("Response","item2_freqs","item2_percent")

d3<-as.data.frame(table(df$item3))
d3$item3_percent<- d3$Freq/sum(d3$Freq)
names(d3)<-c("Response","item3_freqs","item3_percent")

results<-cbind(d1,d2[,2:3],d3[,2:3])

Note I don't really need the freq counts, just the percentages.

Thanks in advance!

225

asked Jun 15 '17 19:06

SarahGC

2 Answers

As you have the same range of values in each item# you can use

sapply(df, function(x) prop.table(table(x)))
#             item1 item2 item3
# Likely       0.50  0.25  0.25
# Unlikely     0.25  0.50  0.25
# Very Likely  0.25  0.25  0.50

But if they were different you can set each item# to have a common set of levels

df[] <- lapply(df, factor, levels=unique(unlist(df)))
sapply(df, function(x) prop.table(table(x)))

answered Oct 19 '22 06:10

user20650

Consider the chain merge with Reduce where you first loop through each column of dataframe by number with lapply to build a list of dataframes that is then passed into merge on Response:

dfList <- lapply(seq_along(df), function(i){      
  d <- as.data.frame(table(df[,i]))
  d$item1_percent <- d$Freq/sum(d$Freq)
  # PASS COLUMN NUMBER INTO DF COLUMN NAMES
  names(d) <- c("Response", paste0("item",i,"_freqs"), paste0("item",i,"_percent"))

  return(d)      
})

results2 <- Reduce(function(x,y) merge(x, y, by="Response", all.equal=TRUE), dfList)

# EQUIVALENT TO ORIGINAL results
all.equal(results, results2)
# [1] TRUE
identical(results, results2)
# [1] TRUE

answered Oct 19 '22 06:10

Parfait

Related questions
                            
                                Change line color depending on y value with ggplot2
                            
                                Multi-line ggplot Title With Different Font Size, Face, etc [duplicate]
                            
                                Problem placing error bars at the center of the columns in ggplot()
                            
                                Counting the number of times the next element in a vector is different to the previous one
                            
                                How do I test for numeric values in a dataframe of characters, and convert those to numeric?
                            
                                vectorize cumsum by factor in R
                            
                                Integrate over an integral in R
                            
                                What is the use of RTVS if you have Rstudio already?
                            
                                How to select range of columns in a dataframe based on their name and not their indexes?
                            
                                Shiny in R: How to set an input value to NULL after clicking on a button?
                            
                                R Test if Lists Contain String
                            
                                Footer alignment in shiny app dashboard
                            
                                Fitting a linear model with multiple LHS
                            
                                High (or very high) order polynomial regression in R (or alternatives?)
                            
                                R flatten nested data.table
                            
                                Counting the number of occurrences
                            
                                Normalize by Group
                            
                                Count comma separated unique values in a string
                            
                                How to directly plot ROC of h2o model object in R
                            
                                How to conditionally subset a list using the purrr package

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With