Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Regular expression to match a number followed by a symbol repeated that many times?

Tags:

regex

grammar

How can I create a RegEx that can match the following:

a3bbb
aaaa3bbb
a4bbbb
aaa5bbbbb

I.e., a (one or more times), then a non-negative number, then b repeated 'that many times' (as many as the number between a and b).

Is this language regular? If not, can we construct a CFG for this?

Edit: As for whether the number is single digit, I would say no. (also as Daniel Centore and rici point out, the language is not even CF. Then the natural question is, is it context-sensitive or unrestricted?)

like image 590
Fi Zixer Avatar asked Dec 14 '15 01:12

Fi Zixer


People also ask

How do you repeat a regular expression?

A repeat is an expression that is repeated an arbitrary number of times. An expression followed by '*' can be repeated any number of times, including zero. An expression followed by '+' can be repeated any number of times, but at least once.

Which pattern matches the preceding pattern atleast n times?

quantifier matches the preceding element at least n times, where n is any integer but as few times as possible. It's the lazy counterpart of the greedy quantifier { n ,} .

What is the regular expression matching one or more specific characters?

The character + in a regular expression means "match the preceding character one or more times". For example A+ matches one or more of character A. The plus character, used in a regular expression, is called a Kleene plus .

How do you identify a repeating number?

To match a number of repetitions of a single digit, you can write ([0-9])\1* . This matches [0-9] into a group, then matches 0 or more repetions ( \1 ) of that group.


1 Answers

Like other answers have said, if the number is unbounded, the language is neither regular (if it's regular, pumping lemma says for a sufficiently long string, the b's could be extended indefinitely independent of the number) nor context-free (if it's context-free, pumping lemma says for a sufficiently long number, the number and the b's could be repeated, but not correctly).

But the language is context-sensitive, as it can be generated using the following grammar (I do it for base-3 number for simplicity, you can extend to base 10):

(1) S -> aS | aB
(2) B -> BN | N
(3) aN -> a0 | a1b | a2bb
(4) 0N -> 00 | 01b | 02bb
(5) 1N -> 10 | 11b | 12bb
(6) 2N -> 20 | 21b | 22bb
(7) bN -> WN
(8) WN -> WX
(9) WX -> NX
(10)NX -> Nbbb

Rule (1) is to generate the a's

Rule (2) is to generate each digit in the number

Rule (3)-(6) is to replace the left-most N with a number and respective number of b's.

Rule (7)-(10) is to have the N "consume" the b's to its left, and produce 3 b's (10 b's in base-10). Technically (7)-(10) is just bN -> Nbbb.

Example:

To generate: a102bbbbbbbbbbb (102 in base-3 = 11 in base-10)
S
aB (1b)
aBN (2a)
aBNN (2a)
aNNN (2b)
a1bNN (3b)
a1NbbbN (7)-(10)
a1NbbNbbb (7)-(10)
a1NbNbbbbbb (7)-(10)
a1NNbbbbbbbbb (7)-(10)
a10Nbbbbbbbbb (5a)
a102bbbbbbbbbbb (4c)
like image 108
justhalf Avatar answered Sep 28 '22 09:09

justhalf