Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting the number of "0" in this factor

Consider the following factor

x = factor(c("1|1","1|0","1|1","1|1","0|0","1|1","0|1"))

I would like to count the number of occurrences of the character "0" in this factor. The only solution I've found so far is

sum(grepl("0",strsplit(paste(sapply(x, as.character), collapse=""), split="")[[1]]))
# [1] 4

This solution seems very complicated for such a simple process. Is there a "better" alternative? (As the process will be repeated about 100,000 times on factors that are 2000 elements long, I might end up caring about performance as well.)

like image 601
Remi.b Avatar asked Mar 02 '17 01:03

Remi.b


1 Answers

x = factor(c("1|1","1|0","1|1","1|1","0|0","1|1","0|1"))
x
# [1] 1|1 1|0 1|1 1|1 0|0 1|1 0|1
# Levels: 0|0 0|1 1|0 1|1

sum( unlist( lapply( strsplit(as.character(x), "|"), function( x ) length(grep( '0', x ))) ) )
# [1] 4

or

sum(nchar(gsub("[1 |]", '', x )))
# [1] 4

Based on @Rich Scriven's Comment

sum(nchar(gsub("[^0]", '', x )))
# [1] 4

Based on @thelatemail's comment - using tabulate works much faster than the above solution. Here is the comparison.

sum(nchar(gsub("[^0]", "", levels(x) )) * tabulate(x))

Time Profile:

x2 <- sample(x,1e7,replace=TRUE)
system.time(sum(nchar(gsub("[^0]", '', x2 ))));
# user  system elapsed 
# 14.24    0.22   14.65 
system.time(sum(nchar(gsub("[^0]", "", levels(x2) )) * tabulate(x2)));
# user  system elapsed 
# 0.04    0.00    0.04 
system.time(sum(str_count(x2, fixed("0"))))
# user  system elapsed 
# 1.02    0.13    1.25
like image 186
Sathish Avatar answered Sep 18 '22 13:09

Sathish