df <- data.frame(category = c("X", "Y"), sequence = c("AAT.G", "CCG-T"), stringsAsFactors = FALSE)
df
category sequence
1 X AAT.G
2 Y CCG-T
I want to separate the column sequence
into 5 columns (one for each character). I tried to do that with tidyr::separate
but it internally uses stringi::stri_split_regex
which doesn't accept an empty string as a separator (although the sep
argument should take a regex).
library(tidyr)
separate(df, sequence, into = paste0("V", 1:5), sep="")
Error: Values not split into 5 pieces at 1, 2
In addition: Warning messages:
1: In stringi::stri_split_regex(value, sep, n_max) :
empty search patterns are not supported
2: In stringi::stri_split_regex(value, sep, n_max) :
empty search patterns are not supported
Expected output looks like this:
category V1 V2 V3 V4 V5
1 X A A T . G
2 Y C C G - T
You could do this with extract
from tidyr
library(tidyr)
extract(df, sequence, into=paste0('V', 1:5), '(.)(.)(.)(.)(.)')
# category V1 V2 V3 V4 V5
#1 X A A T . G
#2 Y C C G - T
Or create a delimiter with gsub
and use that as sep
for the separator
library(dplyr)
library(tidyr)
df %>%
mutate(sequence=gsub('(?<=.)(?=.)', ',', sequence, perl=TRUE)) %>%
separate(sequence, into=paste0('V', 1:5), sep=",")
# category V1 V2 V3 V4 V5
#1 X A A T . G
#2 Y C C G - T
Or you can use cSplit
library(splitstackshape)
setnames(cSplit(df, 'sequence', '', stripWhite=FALSE),
2:6, paste0('V', 1:5))[]
# category V1 V2 V3 V4 V5
#1: X A A T . G
#2: Y C C G - T
sep
can be an integer vector. It would be sufficient to use sep=1:4
but the 5 works too and it looks a bit better.
df %>% separate(sequence, into = paste0("V", 1:5), sep = 1:5)
giving:
category V1 V2 V3 V4 V5
1 X A A T . G
2 Y C C G - T
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With