How can I split this
Chr3:153922357-153944632(-)
Chr11:70010183-70015411(-)
in to
Chr3 153922357 153944632 -
Chr11 70010183 70015411 -
I tried strsplit(df$V1,"[[:punct:]]"))
, but the negative sign is not coming in the final result
Using the STUFF & FOR XML PATH function we can derive the input string lists into an XML format based on delimiter lists. And finally we can load the data into the temp table..
We just have to define an input string we want to split and a pattern. The next step is to apply a pattern. A pattern can match zero or multiple times. To split by different delimiters, we should just set all the characters in the pattern.
Use the String. split() method to split a string with multiple separators, e.g. str. split(/[-_]+/) . The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.
You can also try str_split
from stringr
:
library(stringr)
lapply(str_split(df$V1, "(?<!\\()\\-|[:\\)\\(]"), function(x) x[x != ""])
Result:
[[1]]
[1] "Chr3" "153922357" "153944632" "-"
[[2]]
[1] "Chr11" "70010183" "70015411" "-"
Data:
df = read.table(text = " Chr3:153922357-153944632(-)
Chr11:70010183-70015411(-) ")
How about this in base R using stringsplit
and gsub
:
# Your sample strings
ss <- c("Chr3:153922357-153944632(-)",
"Chr11:70010183-70015411(-)")
# Split items as list of vectors
lst <- lapply(ss, function(x)
unlist(strsplit(gsub("(.+):(\\d+)-(\\d+)\\((.)\\)", "\\1,\\2,\\3,\\4", x), ",")))
# rbind to dataframe if necessary
do.call(rbind, lst);
# [,1] [,2] [,3] [,4]
#[1,] "Chr3" "153922357" "153944632" "-"
#[2,] "Chr11" "70010183" "70015411" "-"
This should work for other chromosome names and positive strand features as well.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With