Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression that match a text if it contains a string ONLY once

Tags:

string

regex

I would like to write a regular expression that matches a text if it contains a string ONLY once. The text must contain <scr> only once. Here are some examples:

hello-<scr>Keephello-endofstring //ok; <scr> occurs once

test-<scr>bla<scr>bla-end //NOT ok; <scr> occurs 2 times

hello-Keephello-end //NOT ok; <scr> doesn't occur

I tried with the following regex:

((?:(?<!<scr>).)*<scr>(?:(?!<scr>).)*)

The first negative lookbehind ensures that <scr> doesn't occur.
Than <scr> must follow.
After this a negative lookahead ensures that no more <scr> follow.

It does not work.
I would like to know how this can be done with regex? (with explanation)

like image 499
deemon Avatar asked May 09 '15 20:05

deemon


People also ask

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.

What is difference [] and () in regex?

[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .

What does regex 0 * 1 * 0 * 1 * Mean?

Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.


1 Answers

To check if text contains some substring only once, you need to check match all characters that do not constitute <scr>, then match <scr>, and use a negative look-ahead to check if there is no <scr> further, and consume all characters. Also, line/string boundaries ^/$ are a must:

^(?:(?!<scr>).)*<scr>(?!.*<scr>).*$

See demo

EXPLANATION:

  • ^ - Start of line (as m multiline option is ON)
  • (?:(?!<scr>).)* - A non-capturing group to match each character (not a newline - for that, you need to also add s singleline option) that is not preceded with <scr>
  • <scr> - Our literal <scr>
  • (?!.*<scr>) - The negative lookbehind checking that we do not have <scr> any more
  • .*$ - Subpattern matching the rest of the characters to the end of the line.
like image 157
Wiktor Stribiżew Avatar answered Sep 29 '22 03:09

Wiktor Stribiżew