I know I've come across this problem before, but I'm having a bit of a mental block at the moment. and as I can't find it on SO, I'll post it here so I can find it next time.
I have a dataframe that contains a field representing an ID label. This label has two parts, an alpha prefix and a numeric suffix. I want to split it apart and create two new fields with these values in.
structure(list(lab = c("N00", "N01", "N02", "B00", "B01", "B02",
"Z21", "BA01", "NA03")), .Names = "lab", row.names = c(NA, -9L
), class = "data.frame")
df$pre<-strsplit(df$lab, "[0-9]+")
df$suf<-strsplit(df$lab, "[A-Z]+")
Which gives
lab pre suf 1 N00 N , 00 2 N01 N , 01 3 N02 N , 02 4 B00 B , 00 5 B01 B , 01 6 B02 B , 02 7 Z21 Z , 21 8 BA01 BA , 01 9 NA03 NA , 03
So, the first strsplit works fine, but the second gives a list, each having two elements, an empty string and the result I want, and stuffs them both into the dataframe column.
How can I select the second sub-element from each element of the list ? (or, is there a better way to do this)
Any element in list can be accessed using zero based index. If index is a negative number, count of index starts from end. As we want second to last element in list, use -2 as index.
To select the second element of each list item:
R> sapply(df$suf, "[[", 2)
[1] "00" "01" "02" "00" "01" "02" "21" "01" "03"
An alternative approach using regular expressions:
df$pre <- sub("^([A-Z]+)[0-9]+", "\\1", df$lab)
df$suf <- sub("^[A-Z]+([0-9]+)", "\\1", df$lab)
with purrr::map this would be
df$suf %>% map_chr(c(2))
for further info on purrr::map
First of all: if you use str(df)
you'll see that df$pre
is list
. I think you want vector
(but I might be wrong).
Return to problem - in this case I will use gsub
:
df$pre <- gsub("[0-9]", "", df$lab)
df$suf <- gsub("[A-Z]", "", df$lab)
This guarantee that both columns are vectors, but it fail if your label is not from key (i.e. 'AB01B'
).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With