Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

More efficient way to get frequency counts across columns of data frame

Tags:

r

dplyr

I have some survey data in which columns correspond to items and rows correspond to customers saying how likely they are to buy each item. Looks like this:

item1 = c("Likely", "Unlikely", "Very Likely","Likely") 
item2 = c("Likely", "Unlikely", "Very Likely","Unlikely")
item3 = c("Very Likely", "Unlikely", "Very Likely","Likely") 
df = data.frame(item1, item2, item3) 

I want a summary table giving the percentage of each response for each item. Right now I'm using table() on each column for this process, and its a lot of code to manipulate. How can I do this using plyr or apply or something faster?

Current solution:

d1<-as.data.frame(table(df$item1))
d1$item1_percent<- d1$Freq/sum(d1$Freq)
names(d1)<-c("Response","item1_freqs","item1_percent")

d2<-as.data.frame(table(df$item2))
d2$item2_percent<- d2$Freq/sum(d2$Freq)
names(d2)<-c("Response","item2_freqs","item2_percent")

d3<-as.data.frame(table(df$item3))
d3$item3_percent<- d3$Freq/sum(d3$Freq)
names(d3)<-c("Response","item3_freqs","item3_percent")

results<-cbind(d1,d2[,2:3],d3[,2:3])

Note I don't really need the freq counts, just the percentages.

Thanks in advance!

like image 225
SarahGC Avatar asked Jun 15 '17 19:06

SarahGC


People also ask

How can I get the frequency counts of each item in one or more columns in a DataFrame?

Using the count(), size() method, Series. value_counts(), and pandas. Index. value_counts() method we can count the number of frequency of itemsets in the given DataFrame.

How can I get the frequency counts of each item in one or more columns in a DataFrame Mcq?

After grouping a DataFrame object on one column, we can apply count() method on the resulting groupby object to get a DataFrame object containing frequency count. This method can be used to count frequencies of objects over single or multiple columns.

How can you get the frequency of different levels in a categorical column?

To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.

How do you find the frequency of a data frame?

In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.


2 Answers

As you have the same range of values in each item# you can use

sapply(df, function(x) prop.table(table(x)))
#             item1 item2 item3
# Likely       0.50  0.25  0.25
# Unlikely     0.25  0.50  0.25
# Very Likely  0.25  0.25  0.50

But if they were different you can set each item# to have a common set of levels

df[] <- lapply(df, factor, levels=unique(unlist(df)))
sapply(df, function(x) prop.table(table(x)))
like image 65
user20650 Avatar answered Oct 19 '22 06:10

user20650


Consider the chain merge with Reduce where you first loop through each column of dataframe by number with lapply to build a list of dataframes that is then passed into merge on Response:

dfList <- lapply(seq_along(df), function(i){      
  d <- as.data.frame(table(df[,i]))
  d$item1_percent <- d$Freq/sum(d$Freq)
  # PASS COLUMN NUMBER INTO DF COLUMN NAMES
  names(d) <- c("Response", paste0("item",i,"_freqs"), paste0("item",i,"_percent"))

  return(d)      
})

results2 <- Reduce(function(x,y) merge(x, y, by="Response", all.equal=TRUE), dfList)

# EQUIVALENT TO ORIGINAL results
all.equal(results, results2)
# [1] TRUE
identical(results, results2)
# [1] TRUE
like image 3
Parfait Avatar answered Oct 19 '22 06:10

Parfait