I need a regex to match tags that looks like <A>, <BB>, <CCC>, but not <ABC>, <aaa>, <>. so the tag must consist of the same uppercase letter, repeated. I've tried <[A-Z]+>, but that doesn't work. of course I can write something like <(A+|B+|C+|...)> and so on, but I wonder if there's a more elegant solution.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
Basically (0+1)* mathes any sequence of ones and zeroes. So, in your example (0+1)*1(0+1)* should match any sequence that has 1. It would not match 000 , but it would match 010 , 1 , 111 etc. (0+1) means 0 OR 1. 1* means any number of ones.
Brackets indicate a set of characters to match. Any individual character between the brackets will match, and you can also use a hyphen to define a set. You can use the ^ metacharacter to negate what is between the brackets.
You can use something like this (see this on rubular.com):
<([A-Z])\1*>
This uses capturing group and backreference. Basically:
(pattern) to "capture" a match\n in your pattern, where n is the group number, to "refer back" to what that group matchedSo in this case:
([A-Z]), an uppercase letter immediately following <
\1*, i.e. zero or more of that same letter
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With