Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

escaping pipe ("|") in a regex

Tags:

regex

r

I have a need to split on words and end marks (punctuation of certain types). Oddly pipe ("|") can count as an end mark. I have code that words on end marks until I try to add the pipe. Adding the pipe makes the strsplit every character. Escaping it causes and error. How can I include the pipe int he regular expression?

x <- "I like the dog|."

strsplit(x, "[[:space:]]|(?=[.!?*-])", perl=TRUE)
#[[1]]
#[1] "I"    "like" "the"  "dog|" "."   

strsplit(x, "[[:space:]]|(?=[.!?*-\|])", perl=TRUE)
#Error: '\|' is an unrecognized escape in character string starting "[[:space:]]|(?=[.!?*-\|"

The outcome I'd like:

#[[1]]
#[1] "I"    "like" "the"  "dog"  "|"  "."  #pipe is an element
like image 732
Tyler Rinker Avatar asked Oct 17 '12 18:10

Tyler Rinker


People also ask

How do you escape a pipe in regex Java?

In regex \ is also used to escape special characters to make them literals like \+ \* . So to escape | in regex we need \| but to create string representing such text we need to write it as "\\|" .

How do you escape expressions in regex?

The \ is known as the escape code, which restore the original literal meaning of the following character. Similarly, * , + , ? (occurrence indicators), ^ , $ (position anchors) have special meaning in regex. You need to use an escape code to match with these characters.

What is pipe in regex?

A pipe symbol allows regular expression components to be logically ORed. For example, the following regular expression matches lines that start with the word "Germany" or the word "Netherlands". Note that parentheses are used to group the two expressive components.

How do you escape a pipe from a string in Python?

Python Regex Escape Pipe You can get rid of the special meaning of the pipe symbol by using the backslash prefix: \| . This way, you can match the parentheses characters in a given string. Here's an example: What is this?


1 Answers

One way to solve this is to use the \Q...\E notation to remove the special meaning of any of the characters in .... As it says in ?regex:

If you want to remove the special meaning from a sequence of characters, you can do so by putting them between ‘\Q’ and ‘\E’. This is different from Perl in that ‘$’ and ‘@’ are handled as literals in ‘\Q...\E’ sequences in PCRE, whereas in Perl, ‘$’ and ‘@’ cause variable interpolation.

For example:

> strsplit(x, "[[:space:]]|(?=[\\Q.!?*-|\\E])", perl=TRUE)
[[1]]
[1] "I"    "like" "the"  "dog"  "|"    "."
like image 119
Joshua Ulrich Avatar answered Sep 20 '22 19:09

Joshua Ulrich