How can I create a RegEx that can match the following:
a3bbb
aaaa3bbb
a4bbbb
aaa5bbbbb
I.e., a
(one or more times), then a non-negative number, then b
repeated 'that many times' (as many as the number between a
and b
).
Is this language regular? If not, can we construct a CFG for this?
Edit: As for whether the number is single digit, I would say no. (also as Daniel Centore and rici point out, the language is not even CF. Then the natural question is, is it context-sensitive or unrestricted?)
A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.
quantifier matches the preceding element at least n times, where n is any integer but as few times as possible. It's the lazy counterpart of the greedy quantifier { n ,} .
The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus .
To match a number of repetitions of a single digit, you can write ([0-9])\1* . This matches [0-9] into a group, then matches 0 or more repetions ( \1 ) of that group.
Like other answers have said, if the number is unbounded, the language is neither regular (if it's regular, pumping lemma says for a sufficiently long string, the b
's could be extended indefinitely independent of the number) nor context-free (if it's context-free, pumping lemma says for a sufficiently long number, the number and the b
's could be repeated, but not correctly).
But the language is context-sensitive, as it can be generated using the following grammar (I do it for base-3 number for simplicity, you can extend to base 10):
(1) S -> aS | aB (2) B -> BN | N (3) aN -> a0 | a1b | a2bb (4) 0N -> 00 | 01b | 02bb (5) 1N -> 10 | 11b | 12bb (6) 2N -> 20 | 21b | 22bb (7) bN -> WN (8) WN -> WX (9) WX -> NX (10)NX -> Nbbb
Rule (1) is to generate the a
's
Rule (2) is to generate each digit in the number
Rule (3)-(6) is to replace the left-most N
with a number and respective number of b
's.
Rule (7)-(10) is to have the N
"consume" the b
's to its left, and produce 3 b
's (10 b
's in base-10). Technically (7)-(10) is just bN -> Nbbb
.
Example:
To generate: a102bbbbbbbbbbb (102 in base-3 = 11 in base-10) S aB (1b) aBN (2a) aBNN (2a) aNNN (2b) a1bNN (3b) a1NbbbN (7)-(10) a1NbbNbbb (7)-(10) a1NbNbbbbbb (7)-(10) a1NNbbbbbbbbb (7)-(10) a10Nbbbbbbbbb (5a) a102bbbbbbbbbbb (4c)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With