Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Can you use back references in the pattern part of a regular expression?

Is there a way to back reference in the regular expression pattern?

Example input string:

Here is "some quoted" text.

Say I want to pull out the quoted text, I could create the following expression:

"([^"]+)"

This regular expression would match some quoted.

Say I want it to also support single quotes, I could change the expression to:

["']([^"']+)["']

But what if the input string has a mixture of quotes say Here is 'some quoted" text. I would not want the regex to match. Currently the regex in the second example would still match.

What I would like to be able to do is if the first quote is a double quote then the closing quote must be a double. And if the start quote is single quote then the closing quote must be single.

Can I use a back reference to achieve this?


My other related question: Getting text between quotes using regular expression

like image 924
Camsoft Avatar asked Apr 27 '10 15:04

Camsoft


People also ask

What is back reference in regular expression?

back-references are regular expression commands which refer to a previous part of the matched regular expression. Back-references are specified with backslash and a single digit (e.g. ' \1 '). The part of the regular expression they refer to is called a subexpression, and is designated with parentheses.

How do you backslash in regular expressions?

To insert a backslash into your regular expression pattern, use a double backslash ('\\'). The open parenthesis indicates a "subexpression", discussed below. The close parenthesis character terminates such a subexpression. Zero or more of the character or expression to the left.

What is backtracking in regex?

Backtracking occurs when a regular expression pattern contains optional quantifiers or alternation constructs, and the regular expression engine returns to a previous saved state to continue its search for a match.


2 Answers

You can make use of the regex:

(["'])[^"']+\1
  • () : used for grouping
  • [..] : is the char class. so ["'] matches either " or ' equivalent to "|'
  • [^..] : char class with negation. It matches any char not listed after the ^
  • + : quantifier for one or more
  • \1 : backreferencing the first group which is (["'])

In PHP you'd use this as:

preg_match('#(["\'])[^"\']+\1#',$str)
like image 68
codaddict Avatar answered Sep 28 '22 16:09

codaddict


preg_match('/(["\'])([^"\']+)\1/', 'Here is \'quoted text" some quoted text.');

Explanation: (["'])([^"']+)\1/ I placed the first quote in parentheses. Because this is the first grouping, it's back reference number is 1. Then, where the closing quote would be, I placed \1 which means whichever character was matched in group 1.

like image 32
webbiedave Avatar answered Sep 28 '22 15:09

webbiedave