I have some survey data in which columns correspond to items and rows correspond to customers saying how likely they are to buy each item. Looks like this:
item1 = c("Likely", "Unlikely", "Very Likely","Likely")
item2 = c("Likely", "Unlikely", "Very Likely","Unlikely")
item3 = c("Very Likely", "Unlikely", "Very Likely","Likely")
df = data.frame(item1, item2, item3)
I want a summary table giving the percentage of each response for each item. Right now I'm using table() on each column for this process, and its a lot of code to manipulate. How can I do this using plyr or apply or something faster?
Current solution:
d1<-as.data.frame(table(df$item1))
d1$item1_percent<- d1$Freq/sum(d1$Freq)
names(d1)<-c("Response","item1_freqs","item1_percent")
d2<-as.data.frame(table(df$item2))
d2$item2_percent<- d2$Freq/sum(d2$Freq)
names(d2)<-c("Response","item2_freqs","item2_percent")
d3<-as.data.frame(table(df$item3))
d3$item3_percent<- d3$Freq/sum(d3$Freq)
names(d3)<-c("Response","item3_freqs","item3_percent")
results<-cbind(d1,d2[,2:3],d3[,2:3])
Note I don't really need the freq counts, just the percentages.
Thanks in advance!
Using the count(), size() method, Series. value_counts(), and pandas. Index. value_counts() method we can count the number of frequency of itemsets in the given DataFrame.
After grouping a DataFrame object on one column, we can apply count() method on the resulting groupby object to get a DataFrame object containing frequency count. This method can be used to count frequencies of objects over single or multiple columns.
To create a frequency column for categorical variable in an R data frame, we can use the transform function by defining the length of categorical variable using ave function. The output will have the duplicated frequencies as one value in the categorical column is likely to be repeated.
In pandas you can get the count of the frequency of a value that occurs in a DataFrame column by using Series. value_counts() method, alternatively, If you have a SQL background you can also get using groupby() and count() method.
As you have the same range of values in each item# you can use
sapply(df, function(x) prop.table(table(x)))
# item1 item2 item3
# Likely 0.50 0.25 0.25
# Unlikely 0.25 0.50 0.25
# Very Likely 0.25 0.25 0.50
But if they were different you can set each item# to have a common set of levels
df[] <- lapply(df, factor, levels=unique(unlist(df)))
sapply(df, function(x) prop.table(table(x)))
Consider the chain merge with Reduce
where you first loop through each column of dataframe by number with lapply
to build a list of dataframes that is then passed into merge
on Response:
dfList <- lapply(seq_along(df), function(i){
d <- as.data.frame(table(df[,i]))
d$item1_percent <- d$Freq/sum(d$Freq)
# PASS COLUMN NUMBER INTO DF COLUMN NAMES
names(d) <- c("Response", paste0("item",i,"_freqs"), paste0("item",i,"_percent"))
return(d)
})
results2 <- Reduce(function(x,y) merge(x, y, by="Response", all.equal=TRUE), dfList)
# EQUIVALENT TO ORIGINAL results
all.equal(results, results2)
# [1] TRUE
identical(results, results2)
# [1] TRUE
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With