Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Vectorization of a for-loop in R

I've two vectors:

  • Vector of texts c('abc', 'asdf', 'werd', 'ffssd')
  • Vector of patterns c('ab', 'd', 'w')

I'd like to vectorize the following for-loop:

for(p in 1 : length(patterns)){
    count <- count + str_count(texts, p);
}

I used the following commands but both won't work.

> str_count(texts, patterns)
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern,  :
  longer object length is not a multiple of shorter object length

> str_count(texts, t(patterns))
[1] 1 1 1 0
Warning message:
In stri_count_regex(string, pattern, opts_regex = attr(pattern,  :
  longer object length is not a multiple of shorter object length

I'd want a 2d matrix like this:

       |  patterns
 ------+--------
       |   1 0 0
 texts |   0 1 0
       |   0 1 1
       |   0 1 0
like image 905
frogatto Avatar asked Dec 11 '15 20:12

frogatto


People also ask

What is vectorization in R programming?

Most of R's functions are vectorized, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone.

Why do we typically prefer vectorized functions to loops in R?

R takes a fair amount of heat from the hacker community because it's kind of slow at looping2. It compensates (somewhat) for this weakness by using vectorized functions! Vectorized functions usually involve a behind-the-scenes loop in a low-level language (C or Fortran), which runs way faster than a pure R loop.

Why not use for loops R?

For loops can be slow if you are incorrectly growing objects or you have a very fast interior of the loop and the entire thing can be replaced with a vectorized operation. Otherwise you're probably not losing too much efficiency, as the apply family of functions are performing for loops on the inside, too.


1 Answers

You can use outer. I assume you are using str_count from the stringr package.

library(stringr)

texts <- c('abc', 'asdf', 'werd', 'ffssd')
patterns <- c('ab', 'd', 'w')

matches <- outer(texts, patterns, str_count)

# set dim names
colnames(matches) <- patterns
rownames(matches) <- texts
matches
      ab d w
abc    1 0 0
asdf   0 1 0
werd   0 1 1
ffssd  0 1 0

EDIT

# or set names directly within 'outer' as noted by @RichardScriven
outer(setNames(nm = texts), setNames(nm = patterns), str_count)
like image 198
cdeterman Avatar answered Oct 01 '22 02:10

cdeterman