I would like to summarize the pass/fail status for my data as below. In other words, I would like to tell the number of pass and fail cases for each product/type.
library(ggplot2)
library(plyr)
product=c("p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p1","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2","p2")
type=c("t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2","t1","t1","t1","t1","t1","t1","t2","t2","t2","t2","t2","t2")
skew=c("s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2","s1","s1","s1","s2","s2","s2")
color=c("c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3","c1","c2","c3")
result=c("pass","pass","fail","pass","pass","pass","fail","pass","fail","pass","fail","pass","fail","pass","fail","pass","pass","pass","pass","fail","fail","pass","pass","fail")
df = data.frame(product, type, skew, color, result)
The following cmd returns the total number of pass+fail cases but I want separate columns for pass and fail
dfSummary <- ddply(df, c("product", "type"), summarise, N=length(result))
Result is:
product type N
1 p1 t1 6
2 p1 t2 6
3 p2 t1 6
4 p2 t2 6
The desireable result would be
product type Pass Fail
1 p1 t1 5 1
2 p1 t2 3 3
3 p2 t1 4 2
4 p2 t2 3 3
I have attempted somthing like:
dfSummary <- ddply(df, c("product", "type"), summarise, Pass=length(df$product[df$result=="pass"]), Fail=length(df$product[df$result=="fail"]) )
but obviously it’s wrong since the results are the grand totatl for fail and pass.
Thanks in advance for your advice ! Regards, Riad.
Use the length() function to count the number of elements returned by the which() function, as which function returns the elements that are repeated more than once. The length() function in R Language is used to get or set the length of a vector (list) or other objects.
Method 2: Using sum() method in R The sum() method can be used to calculate the summation of the values appearing in the function argument. Here, we specify a logical expression as an argument of the sum() function which calculates the sum of values which are equivalent to the specified value.
count() lets you quickly count the unique values of one or more variables: df %>% count(a, b) is roughly equivalent to df %>% group_by(a, b) %>% summarise(n = n()) . count() is paired with tally() , a lower-level helper that is equivalent to df %>% summarise(n = n()) .
Try:
dfSummary <- ddply(df, c("product", "type"), summarise,
Pass=sum(result=="pass"), Fail=sum(result=="fail") )
Which gives me result:
product type Pass Fail
1 p1 t1 5 1
2 p1 t2 3 3
3 p2 t1 4 2
4 p2 t2 3 3
Explanation:
df
to the ddply
function.ddply
is splitting on the variables, "product" and "type"
length(unique(product)) * length(unique(type))
pieces (i.e. subsets of the data df
) split on every combination of the two variables.ddply
applies some function that you provide. In this case, you count the number of result=="pass"
and result=="fail"
there are.ddply
is left with some results for each piece, namely the variables you split on (product and type) and the results you requested (Pass and Fail).You could also use reshape2::dcast
.
library(reshape2)
dcast(product + type~result,data=df, fun.aggregate= length,value.var = 'result')
## product type fail pass
## 1 p1 t1 1 5
## 2 p1 t2 3 3
## 3 p2 t1 2 4
## 4 p2 t2 3 3
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With