Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Accessing element of a split string in R

Tags:

string

r

If I have a string,

x <- "Hello World"

How can I access the second word, "World", using string split, after

x <- strsplit(x, " ")

x[[2]] does not do anything.

like image 661
roc11111111 Avatar asked Dec 26 '16 21:12

roc11111111


1 Answers

As mentioned in the comments, it's important to realise that strsplit returns a list object. Since your example is only splitting a single item (a vector of length 1) your list is length 1. I'll explain with a slightly different example, inputting a vector of length 3 (3 text items to split):

input <- c( "Hello world", "Hi there", "Back at ya" )

x <- strsplit( input, " " )

> x
[[1]]
[1] "Hello" "world"

[[2]]
[1] "Hi"    "there"

[[3]]
[1] "Back" "at"   "ya"  

Notice that the returned list has 3 elements, one for each element of the input vector. Each of those list elements is split as per the strsplit call. So we can recall any of these list elements using [[ (this is what your x[[2]] call was doing, but you only had one list element, which is why you couldn't get anything in return):

> x[[1]]
[1] "Hello" "world"

> x[[3]]
[1] "Back" "at"   "ya" 

Now we can get the second part of any of those list elements by appending a [ call:

> x[[1]][2]
[1] "world"

> x[[3]][2]
[1] "at"

This will return the second item from each list element (note that the "Back at ya" input has returned "at" in this case). You can do this for all items at once using something from the apply family. sapply will return a vector, which will probably be good in this case:

> sapply( x, "[", 2 )
[1] "world" "there" "at"

The last value in the input here (2) is passed to the [ operator, meaning the operation x[2] is applied to every list element.

If instead of the second item, you'd like the last item of each list element, we can use tail within the sapply call instead of [:

> sapply( x, tail, 1 )
[1] "world" "there" "ya"

This time, we've applied tail( x, 1 ) to every list element, giving us the last item.

As a preference, my favourite way to apply actions like these is with the magrittr pipe, for the second word like so:

x <- input %>%
    strsplit( " " ) %>%
    sapply( "[", 2 )

> x
[1] "world" "there" "at"

Or for the last word:

x <- input %>%
    strsplit( " " ) %>%
    sapply( tail, 1 )

> x
[1] "world" "there" "ya" 
like image 62
rosscova Avatar answered Oct 12 '22 11:10

rosscova