Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Splitting strings with unescaped separator in R

Tags:

regex

r

I have to read a file with R, where a variable number of columns is separated by the | character. However, if it is preceded by a \ it should not be considered a separator.

I first thought something like strsplit(x, "[^\\][|]") would work, but the problem here is that the character before each pipe is "consumed":

> strsplit("word1|word2|word3\\|aha!|word4", "[^\\][|]")
[[1]]
[1] "word"        "word"        "word3\\|aha" "word4" 

Can anyone suggest a way to do this? Ideally it should be vectorized since the files in question are very large.

like image 880
asieira Avatar asked Jan 14 '23 02:01

asieira


1 Answers

I believe this works; using Anirudh's downvoted answer (not sure why the downvote, it doesn't work but the regex was correct)

strsplit(x, "(?<!\\\\)[|]", perl=TRUE)

## > strsplit(x, "(?<!\\\\)[|]", perl=TRUE)
## [[1]]
## [1] "word1"        "word2"        "word3\\|aha!" "word4" 
like image 88
Tyler Rinker Avatar answered Jan 16 '23 18:01

Tyler Rinker