I am trying to extract the part of the string before the first backslash but I can't seem to get it tot work properly.
I have tried multiple ways of getting it to work, based on the manual page for strsplit and after searching online.
In my actual situation the strings are in a dataframe which I get from a database connection but I can simplify the situation with the following:
> strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=TRUE) [[1]] [1] "BLAAT1\022E:" "BLAAT2" "BLAAT3" > strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\",fixed=FALSE) Error in strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3", "\\", fixed = FALSE) : invalid regular expression '\', reason 'Trailing backslash' > strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=TRUE) [[1]] [1] "BLAAT1\022E:\\BLAAT2\\BLAAT3" > strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\",fixed=FALSE) [[1]] [1] "BLAAT1\022E:" "BLAAT2" "BLAAT3"
The expected output would also split on the \ between BLAAT1 and 022E:
Thanks in advance
In R (and elsewhere), the backslash is the “escape” symbol, which is followed by another symbol to indicate a special character. For example, "\t" represents a “tab” and "\n" is the symbol for a new line (hard return). This is illustrated below.
To insert a backslash into your regular expression pattern, use a double backslash ('\\'). The open parenthesis indicates a "subexpression", discussed below. The close parenthesis character terminates such a subexpression. Zero or more of the character or expression to the left.
If you use a regex with strsplit
function, a literal backslash can be coded as two literal backslashes (as a literal \
is a special regex metacharacter that is used to form regex escapes, like \d
, \w
, etc.), but since R string literals support string escape sequences (like "\r"
for carriage return, "\n"
for a newline char) a literal backslash needs to be defined with a double backslash.
So, "\\"
is a literal \
, and a regex pattern to match a literal backslash char, being \\
, should be coded with 4 backslashes, "\\\\"
.
Here is a regex that you can use: it splits at \
and a non-printable character:
strsplit("BLAAT1\022E:\\BLAAT2\\BLAAT3","\\\\|[^[:print:]]",fixed=FALSE)
# [1] "BLAAT1" "E:" "BLAAT2" "BLAAT3"
See IDEONE demo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With