Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

sapply() with strsplit in R

Tags:

r

strsplit

I found this code:

string = c("G1:E001", "G2:E002", "G3:E003")
> sapply(strsplit(string, ":"), "[", 2)
[1] "E001" "E002" "E003"

clearly strsplit(string, ":") returns a vectors of size 3 where each component i is a vector of size 2 containing Gi and E00i.

But why the two more arguments "[", 2 have the effect to select only those E00i? As far as I see the only arguments accepted by the function are:

sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE) 
like image 306
Leonardo Avatar asked Jul 05 '15 21:07

Leonardo


People also ask

What does Strsplit do in R?

Strsplit(): An R Language function which is used to split the strings into substrings with split arguments. Where: X = input data file, vector or a stings. Split = Splits the strings into required formats.

How do you split a character in R?

To split a string in R, use the strsplit() method. The strsplit() is a built-in R function that splits the string vector into sub-strings. The strsplit() method returns the list, where each list item resembles the item of input that has been split.


3 Answers

You could use sub to get the expected output instead of using strsplit/sapply

 sub('.*:', '', string)
 #[1] "E001" "E002" "E003"

Regarding your code, strsplit output is a list and list can be processed with apply family functions sapply/lapply/vapply/rapply etc. In this case, each list element have a length of 2 and we are selecting the second element.

strsplit(string, ":")
#[[1]]
#[1] "G1"   "E001"

#[[2]]
#[1] "G2"   "E002"

#[[3]]
#[1] "G3"   "E003"

lapply(strsplit(string, ":"), `[`, 2)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"

In the case of sapply, the default option is simplify=TRUE

 sapply(strsplit(string, ":"), `[`, 2, simplify=FALSE)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"

The [ can be replaced by anonymous function call

sapply(strsplit(string, ":"), function(x) x[2], simplify=FALSE)
#[[1]]
#[1] "E001"

#[[2]]
#[1] "E002"

#[[3]]
#[1] "E003"
like image 183
akrun Avatar answered Oct 05 '22 12:10

akrun


Look at the docs for ?sapply:

 sapply(X, FUN, ..., simplify = TRUE, USE.NAMES = TRUE)

 FUN: the function to be applied to each element of ‘X’: see
      ‘Details’.  In the case of functions like ‘+’, ‘%*%’, the
      function name must be backquoted or quoted.

 ...: optional arguments to ‘FUN’.

There-in lies your answer. In your case, FUN is [. The "optional arguments to fun" is "2" in your case since it gets matched to ... in your call. So in this case, sapply is calling [ with the values in the list as the first argument, and 2 as the second. Consider:

x <- c("G1", "E001")   # this is the result of `strsplit` on the first value

Then:

`[`(x, 2)      # equivalent to x[2]
# [1] "E001"

This is what sapply is doing in your example, except it is applying to every 2 length character vector returned by strsplit.

like image 24
BrodieG Avatar answered Oct 05 '22 12:10

BrodieG


Because the output of strsplit() is a list. The "[" addresses the elements of the list, and the 2 indicates that the second item of a member of the list is selected. The sapply() function ensures that this is done for each member of the list. Here [ is the function in sapply(), which is applied to the list of strsplit()and called with the additional parameter 2.

> strsplit(string, ":")
#[[1]]
#[1] "G1"   "E001"
#
#[[2]]
#[1] "G2"   "E002"
#
#[[3]]
#[1] "G3"   "E003"
#
> str(strsplit(string, ":"))
#List of 3
# $ : chr [1:2] "G1" "E001"
# $ : chr [1:2] "G2" "E002"
# $ : chr [1:2] "G3" "E003"
like image 39
RHertel Avatar answered Oct 05 '22 11:10

RHertel