I've been trying to understand how to deal with the output of strsplit
a bit better. I often have data such as this that I wish to split:
mydata <- c("144/4/5", "154/2", "146/3/5", "142", "143/4", "DNB", "90")
#[1] "144/4/5" "154/2" "146/3/5" "142" "143/4" "DNB" "90"
After splitting that the results are as follows:
strsplit(mydata, "/")
#[[1]]
#[1] "144" "4" "5"
#[[2]]
#[1] "154" "2"
#[[3]]
#[1] "146" "3" "5"
#[[4]]
#[1] "142"
#[[5]]
#[1] "143" "4"
#[[6]]
#[1] "DNB"
#[[7]]
#[1] "90"
I know from the strsplit help guide that final empty strings are not produced. Therefore, there will be 1, 2 or 3 elements in each of my results based on the number of "/" to split by
Getting the first element is very trivial:
sapply(strsplit(mydata, "/"), "[[", 1)
#[1] "144" "154" "146" "142" "143" "DNB" "90"
But I am not sure how to get the 2nd, 3rd... when there are these unequal number of elements in each result.
sapply(strsplit(mydata, "/"), "[[", 2)
# Error in FUN(X[[4L]], ...) : subscript out of bounds
I would hope to return from a working solution, the following:
#[1] "4" "2" "3" "NA" "4" "NA" "NA"
This is a relatively small example. I could do some for loop very easily on these data, but for real data with 1000s of observations to run the strsplit on and dozens of elements produced from that, I was hoping to find a more generalizable solution.
(at least regarding 1D vectors) [
seems to return NA
when "i > length(x)" whereas [[
returns an error.
x = runif(5)
x[6]
#[1] NA
x[[6]]
#Error in x[[6]] : subscript out of bounds
Digging a bit, do_subset_dflt
(i.e. [
) calls ExtractSubset
where we notice that when a wanted index ("ii") is "> length(x)" NA
is returned (a bit modified to be clean):
if(0 <= ii && ii < nx && ii != NA_INTEGER)
result[i] = x[ii];
else
result[i] = NA_INTEGER;
On the other hand do_subset2_dflt
(i.e. [[
) returns an error if the wanted index ("offset") is "> length(x)" (modified a bit to be clean):
if(offset < 0 || offset >= xlength(x)) {
if(offset < 0 && (isNewList(x)) ...
else errorcall(call, R_MSG_subs_o_b);
}
where #define R_MSG_subs_o_b _("subscript out of bounds")
(I'm not sure about the above code snippets but they do seem relevant based on their returns)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With