R programming: plyr how to count values from a column with ddply [duplicate]

Tags:

r

plyr

I would like to summarize the pass/fail status for my data as below. In other words, I would like to tell the number of pass and fail cases for each product/type.

library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)

The following cmd returns the total number of pass+fail cases but I want separate columns for pass and fail

dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))

Result is:

        product type N
 1      p1      t1   6
 2      p1      t2   6
 3      p2      t1   6
 4      p2      t2   6

The desireable result would be

         product type Pass Fail
 1       p1      t1   5    1
 2       p1      t2   3    3
 3       p2      t1   4    2
 4       p2      t2   3    3

I have attempted somthing like:

 dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )

but obviously it’s wrong since the results are the grand totatl for fail and pass.

Thanks in advance for your advice ! Regards, Riad.

679

asked Nov 20 '13 17:11

Riad

2 Answers

Try:

dfSummary <- ddply(df, c("product", "type"), summarise, 
                   Pass=sum(result=="pass"), Fail=sum(result=="fail") )

Which gives me result:

  product type Pass Fail
1      p1   t1    5    1
2      p1   t2    3    3
3      p2   t1    4    2
4      p2   t2    3    3

Explanation:

You are giving the data set, df to the ddply function.
ddply is splitting on the variables, "product" and "type"
- This results in length(unique(product)) * length(unique(type)) pieces (i.e. subsets of the data df) split on every combination of the two variables.
With each of the pieces, ddply applies some function that you provide. In this case, you count the number of result=="pass" and result=="fail" there are.
Now ddply is left with some results for each piece, namely the variables you split on (product and type) and the results you requested (Pass and Fail).
It combines all of the pieces together and returns it

121

answered Sep 21 '22 03:09

ialm

You could also use reshape2::dcast.

library(reshape2)
dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result')
##   product type fail pass
## 1      p1   t1    1    5
## 2      p1   t2    3    3
## 3      p2   t1    2    4
## 4      p2   t2    3    3

answered Sep 18 '22 03:09

mnel

Related questions
                            
                                accessing R from SAS
                            
                                R/ggplot2: smooth on entire dataset while enforcing a ylim cap
                            
                                How do I set column names to lower case for multiple dataframes?
                            
                                How to Connect R with MySQL or how to install RMySQL package?
                            
                                Select a value for based on a highest value in another column
                            
                                C compilation flags from R
                            
                                use R to retrieve public link to dropbox file
                            
                                Convert curl code into R via the RCurl package?
                            
                                Export a polygon from an R plot as a shapefile
                            
                                Change values in row based on a column value r
                            
                                How to recycle colours in a colorbrewer palette using line symbols
                            
                                Load and save single objects to workspaces in R/R-Studio
                            
                                How to plot smoother curves in R
                            
                                Reduce PDF file size of plots by filtering hidden objects
                            
                                R: horizontal barplot with y-axis-labels next to every bar
                            
                                R Error using readHTMLTable
                            
                                Example for svm feature selection in R
                            
                                Creating a monthly/yearly calendar image with ggplot2
                            
                                To find the difference between two column elements in a data frame
                            
                                Where does the bootstrap standard error live in the boot class?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With