I have a data table with one string column. I'd like to create another column that is a subset of this column using strsplit.
dat <- data.table(labels=c('a_1','b_2','c_3','d_4'))
The output I want is
label sub_label
a_1 a
b_2 b
c_3 c
d_4 d
I've tried the followings but neither seems to work.
dat %>%
mutate(
sub_labels=strsplit(as.character(labels), "_")[[1]][1]
)
# gives a column whose values are all "a"
this one, which seems logical to me,
dat %>%
mutate(
sub_labels=sapply(strsplit(as.character(labels), "_"), function(x) x[[1]][1])
)
gives an error
Error: Don't know how to handle type pairlist
I saw another post where paste-collapse on the output from strsplit worked so I don't understand why subsetting in an anonymous function is giving issues. Thanks for any elucidation on this.
tidyr::separate
can help here:
> dat %>% separate(labels, c("first", "second") )
first second
1: a 1
2: b 2
3: c 3
4: d 4
Another method uses purrr
's map_chr
, which I've found useful for applications where I didn't want to bother with separating and uniting (e.g. using the results in a sprintf
with other strings):
tibble(labels=c('a_1','b_2','c_3','d_4')) %>%
mutate(sub_label = stringr::str_split(labels, "_") %>% map_chr(., 1))
This method can be substantially faster than separate
in my experience, especially when you have longer datasets. separate
barely beats map when I use 100 strings, but falls behind in most cases when I use 1000 (not sure what's up with that max).
> microbenchmark::microbenchmark(
+ d.filtered_reads %>% head(1000) %>%
+ mutate(name = stringr::str_split(Header, " ") %>% map_chr(., 1)) %>%
+ select(-Header),
+ d.filtered_reads %>% head(1000) %>%
+ separate(Header, into = c("name","index"), sep = " ") %>%
+ select(-"index")
+ )
Unit: milliseconds
expr
d.filtered_reads %>% head(1000) %>% mutate(name = stringr::str_split(Header, " ") %>% map_chr(., 1)) %>% select(-Header)
d.filtered_reads %>% head(1000) %>% separate(Header, into = c("name", "index"), sep = " ") %>% select(-"index")
min lq mean median uq max neval
5.333891 5.817589 6.292954 5.935706 6.059031 41.530089 100
7.517316 8.031325 8.399471 8.500359 8.647468 9.855612 100
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With