Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What exactly does sapply with '[' do?

Tags:

string

split

r

I was browsing some answer concerning strsplit in R. Example text:

fileName <- c("hello.w-rp-al",
              "how.nez-r",
              "do.qs-sdz",
              "you.d-aerd",
              "do.dse-e")

I wanted to get the first element of the created list and thought I could use something such as

fileNameSplit <- strsplit(fileName, "[.]")
node_1 <- fileNameSplit[0]
node_2 <- fileNameSplit[1]

But that didn't work.

Then I found this answer that suggests using sapply with [. This does work.

d <- data.frame(fileName)
fileNameSplit <- strsplit(d$fileName, "[.]")
d$node_1 <- sapply(fileNameSplit, "[", 1)
d$node_2 <- sapply(fileNameSplit, "[", 2)

However, I'm trying to figure out why. What exactly is happening, and what does [ have to do with anything? It's semantically confusing in my opinion.

like image 800
Bram Vanroy Avatar asked Aug 24 '15 21:08

Bram Vanroy


2 Answers

sapply operates on lists, which are vectors where each element can take any form.


In the special case of your fileNameSplit list, we know that each element of the list is a character vector with two elements.

> fileNameSplit 
[[1]]
[1] "hello"   "w-rp-al"

[[2]]
[1] "how"   "nez-r"

[[3]]
[1] "do"     "qs-sdz"

[[4]]
[1] "you"    "d-aerd"

[[5]]
[1] "do"    "dse-e"

To extract the first element from each of these character vectors, we have to iterate over the list, which is what

sapply(fileNameSplit, `[`, 1)

does. It may be clearer when written as

sapply(fileNameSplit, function(x) x[1])

The documentation at ?`[` and ?sapply explains why the shorter version works.

We use 1 because that is where indexing starts in R (unlike other languages that start at 0).

like image 117
Frank Avatar answered Nov 15 '22 23:11

Frank


R is very LisP-like. The symbol [ is actually a function. When you write mylist[1], what is actually happening "under the hood" is that the numbered or named items (only one in this instance) inside the flanking square brackets are extracted and passed to the [ function from 'mylist` which became the first function argument, so it becomes:

 `[`(mylist, 1)   # that will also succeed if you type it at the command line

Both sapply and lapply have a trailing triple-dots argument. So the series of items being passed to [ as it first arguments are just the values inside fileNameSplit's sublists and the 1 is being recycled as a second argument, and you, therefore get the first item in each of those sublists. The sapply function creates a series of calls like:

 `[`(mylist[[1]], 1)   # as the first one with 2,3, ... in the [[.]] for succeeding calls

And then retruns them as a matrix or a list (depending on whether they are all the same length and the setting of the "simplify" argument.)

Because you used sapply with no "simplify" arg, the default TRUE gets used and the value gets passed to simplify2array and comes back to you as a vector-result, instead of the list that would have been returned had you just used lapply.

like image 40
IRTFM Avatar answered Nov 16 '22 00:11

IRTFM