I have a need to split on words and end marks (punctuation of certain types). Oddly pipe ("|") can count as an end mark. I have code that words on end marks until I try to add the pipe. Adding the pipe makes the strsplit
every character. Escaping it causes and error. How can I include the pipe int he regular expression?
x <- "I like the dog|."
strsplit(x, "[[:space:]]|(?=[.!?*-])", perl=TRUE)
#[[1]]
#[1] "I" "like" "the" "dog|" "."
strsplit(x, "[[:space:]]|(?=[.!?*-\|])", perl=TRUE)
#Error: '\|' is an unrecognized escape in character string starting "[[:space:]]|(?=[.!?*-\|"
The outcome I'd like:
#[[1]]
#[1] "I" "like" "the" "dog" "|" "." #pipe is an element
In regex \ is also used to escape special characters to make them literals like \+ \* . So to escape | in regex we need \| but to create string representing such text we need to write it as "\\|" .
The \ is known as the escape code, which restore the original literal meaning of the following character. Similarly, * , + , ? (occurrence indicators), ^ , $ (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.
A pipe symbol allows regular expression components to be logically ORed. For example, the following regular expression matches lines that start with the word "Germany" or the word "Netherlands". Note that parentheses are used to group the two expressive components.
Python Regex Escape Pipe You can get rid of the special meaning of the pipe symbol by using the backslash prefix: \| . This way, you can match the parentheses characters in a given string. Here's an example: What is this?
One way to solve this is to use the \Q...\E
notation to remove the special meaning of any of the characters in ...
. As it says in ?regex
:
If you want to remove the special meaning from a sequence of characters, you can do so by putting them between ‘\Q’ and ‘\E’. This is different from Perl in that ‘$’ and ‘@’ are handled as literals in ‘\Q...\E’ sequences in PCRE, whereas in Perl, ‘$’ and ‘@’ cause variable interpolation.
For example:
> strsplit(x, "[[:space:]]|(?=[\\Q.!?*-|\\E])", perl=TRUE)
[[1]]
[1] "I" "like" "the" "dog" "|" "."
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With