Consider I have four objects (a,b,c,d
), and I ask five persons to label them (category 1 or 2) according to their physical appearance or something else. The labels provided by five persons for these objects are shown as
df <- data.frame(a = c(1,2,1,2,1), b=c(1,2,2,1,1), c= c(2,1,2,2,2), d=c(1,2,1,2,1))
In tabular format,
---------
a b c d
---------
1 1 2 1
2 2 1 2
1 2 2 1
2 1 2 2
1 1 2 1
----------
Now I want to calculate the percentage of times a group of objects were given the same label (either 1 or 2). For example, objects a, b and d were given the same label by 3 persons out of 5 persons. So its percentage is 3/5 (=60%). While as objects a and d were given same labels by all the people, so its percentage is 5/5 (=100%)
I can calculate this statistic manually, but in my original dataset, I have 50 such objects and the people are 30 and the labels are 4 (1,2,3, and 4). How can I compute such statistics for this bigger dataset automatically? Are there any existing packages/tools in R
which can calculate such statistics?
Note: A group can be of any size. In the first example, a group consists of a,b and d while as second example group consists of a and d.
If you have numeric ratings, you could use diff
to check if you consistently have 0
difference between each rater:
f <- function(cols, data) {
sum(colSums(diff(t(data[cols]))==0)==(length(cols)-1)) / nrow(data)
}
Results are as expected when applying the function to example groups:
f(c("a","b","d"), df)
#[1] 0.6
f(c("a","d"), df)
#[1] 1
There are two tasks here: firstly, making a list of all the relevant combinations, and secondly, evaluating and aggregating rowwise similarity. combn
can start the first task, but it takes a little massaging to arrange the results into a neat list. The second task could be handled with prop.table
, but here it's simpler to calculate directly.
Here I've used tidyverse
grammar (primarily purrr
, which is helpful for handling lists), but convert into base if you like.
library(tidyverse)
map(2:length(df), ~combn(names(df), .x, simplify = FALSE)) %>% # get combinations
flatten() %>% # eliminate nesting
set_names(map_chr(., paste0, collapse = '')) %>% # add useful names
# subset df with combination, see if each row has only one unique value
map(~apply(df[.x], 1, function(x){n_distinct(x) == 1})) %>%
map_dbl(~sum(.x) / length(.x)) # calculate TRUE proportion
## ab ac ad bc bd cd abc abd acd bcd abcd
## 0.6 0.2 1.0 0.2 0.6 0.2 0.0 0.6 0.2 0.0 0.0
With base R functions you could do:
groupVec = c("a","b","d")
transDF = t(as.matrix(DF))
subDF = transDF[rownames(transDF) %in% groupVec,]
subDF
# [,1] [,2] [,3] [,4] [,5]
# a 1 2 1 2 1
# b 1 2 2 1 1
# d 1 2 1 2 1
#if length of unique values is 1, it implies match across all objects, count unique values/total columns = match pct
match_pct = sum(sapply(as.data.frame(subDF), function(x) sum(length(unique(x))==1) ))/ncol(subDF)
match_pct
# [1] 0.6
Wrapping it in a custom funtion:
fn_matchPercent = function(groupVec = c("a","d") ) {
transDF = t(as.matrix(DF))
subDF = transDF[rownames(transDF) %in% groupVec,]
match_pct = sum(sapply(as.data.frame(subDF), function(x) sum(length(unique(x))==1) ))/ncol(subDF)
outputDF = data.frame(groups = paste0(groupVec,collapse=",") ,match_pct = match_pct)
return(outputDF)
}
fn_matchPercent(c("a","d"))
# groups match_pct
# 1 a,d 1
fn_matchPercent(c("a","b","d"))
# groups match_pct
# 1 a,b,d 0.6
Try this:
find.unanimous.percentage <- function(df, at.a.time) {
cols <- as.data.frame(t(combn(names(df), at.a.time)))
names(cols) <- paste('O', 1:at.a.time, sep='')
cols$percent.unanimous <- 100*colMeans(apply(cols, 1, function(x) apply(df[x], 1, function(y) length(unique(y)) == 1)))
return(cols)
}
find.unanimous.percentage(df, 2) # take 2 at a time
O1 O2 percent.unanimous
1 a b 60
2 a c 20
3 a d 100
4 b c 20
5 b d 60
6 c d 20
find.unanimous.percentage(df, 3) # take 3 at a time
O1 O2 O3 percent.unanimous
1 a b c 0
2 a b d 60
3 a c d 20
4 b c d 0
find.unanimous.percentage(df, 4)
O1 O2 O3 O4 percent.unanimous
1 a b c d 0
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With