Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Use strsplit with multiple delimiters [duplicate]

Tags:

regex

r

strsplit

How can I split this

 Chr3:153922357-153944632(-)
 Chr11:70010183-70015411(-)   

in to

    Chr3  153922357 153944632 - 
    Chr11 70010183  70015411  -   

I tried strsplit(df$V1,"[[:punct:]]")), but the negative sign is not coming in the final result

like image 350
Kryo Avatar asked Dec 12 '17 13:12

Kryo


People also ask

How do I split a string with multiple delimiters in SQL?

Using the STUFF & FOR XML PATH function we can derive the input string lists into an XML format based on delimiter lists. And finally we can load the data into the temp table..

How do you pass multiple delimiters in Java?

We just have to define an input string we want to split and a pattern. The next step is to apply a pattern. A pattern can match zero or multiple times. To split by different delimiters, we should just set all the characters in the pattern.

How do you split with multiple separators?

Use the String. split() method to split a string with multiple separators, e.g. str. split(/[-_]+/) . The split method can be passed a regular expression containing multiple characters to split the string with multiple separators.


2 Answers

You can also try str_split from stringr:

library(stringr)
lapply(str_split(df$V1, "(?<!\\()\\-|[:\\)\\(]"), function(x) x[x != ""])

Result:

[[1]]
[1] "Chr3"      "153922357" "153944632" "-"        

[[2]]
[1] "Chr11"    "70010183" "70015411" "-"

Data:

df = read.table(text = " Chr3:153922357-153944632(-)
 Chr11:70010183-70015411(-) ")
like image 103
acylam Avatar answered Sep 29 '22 22:09

acylam


How about this in base R using stringsplit and gsub:

# Your sample strings
ss <- c("Chr3:153922357-153944632(-)",
        "Chr11:70010183-70015411(-)")

# Split items as list of vectors 
lst <- lapply(ss, function(x)
    unlist(strsplit(gsub("(.+):(\\d+)-(\\d+)\\((.)\\)", "\\1,\\2,\\3,\\4", x), ",")))


# rbind to dataframe if necessary
do.call(rbind, lst);
#    [,1]    [,2]        [,3]        [,4]
#[1,] "Chr3"  "153922357" "153944632" "-"
#[2,] "Chr11" "70010183"  "70015411"  "-"

This should work for other chromosome names and positive strand features as well.

like image 28
Maurits Evers Avatar answered Sep 29 '22 23:09

Maurits Evers