I need a regex to match tags that looks like <A>
, <BB>
, <CCC>
, but not <ABC>
, <aaa>
, <>
. so the tag must consist of the same uppercase letter, repeated. I've tried <[A-Z]+>
, but that doesn't work. of course I can write something like <(A+|B+|C+|...)>
and so on, but I wonder if there's a more elegant solution.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.
Brackets indicate a set of characters to match. Any individual character between the brackets will match, and you can also use a hyphen to define a set. You can use the ^ metacharacter to negate what is between the brackets.
You can use something like this (see this on rubular.com):
<([A-Z])\1*>
This uses capturing group and backreference. Basically:
(pattern)
to "capture" a match\n
in your pattern, where n
is the group number, to "refer back" to what that group matchedSo in this case:
([A-Z])
, an uppercase letter immediately following <
\1*
, i.e. zero or more of that same letter
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With