I have to read a file with R, where a variable number of columns is separated by the |
character. However, if it is preceded by a \
it should not be considered a separator.
I first thought something like strsplit(x, "[^\\][|]")
would work, but the problem here is that the character before each pipe is "consumed":
> strsplit("word1|word2|word3\\|aha!|word4", "[^\\][|]")
[[1]]
[1] "word" "word" "word3\\|aha" "word4"
Can anyone suggest a way to do this? Ideally it should be vectorized since the files in question are very large.
I believe this works; using Anirudh's downvoted answer (not sure why the downvote, it doesn't work but the regex was correct)
strsplit(x, "(?<!\\\\)[|]", perl=TRUE)
## > strsplit(x, "(?<!\\\\)[|]", perl=TRUE)
## [[1]]
## [1] "word1" "word2" "word3\\|aha!" "word4"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With