I saw this question, and was curious as to what the pumping lemma was (Wikipedia didn't help much). I understand that it's basically a theoretical proof that must be true in order for a language to be in a certain class, but beyond that I don't really get it. Anyone care to try to explain it at a fairly granular level in a way understandable by non mathematicians/comp sci doctorates?

The pumping lemma is a simple proof to show that a language is not regular, meaning that a Finite State Machine cannot be built for it. The canonical example is the language <code>(a^n)(b^n)</code>. This is the simple language which is just any number of <code>a</code>s, followed by the same number of <code>b</code>s. So the strings <pre class="prettyprint"><code>ab aabb aaabbb aaaabbbb </code></pre> etc. are in the language, but <pre class="prettyprint"><code>aab bab aaabbbbbb </code></pre> etc. are not. It's simple enough to build a FSM for these examples: <img src="https://i.stack.imgur.com/eqcKN.jpg" alt="FSM"> This one will work all the way up to n=4. The problem is that our language didn't put any constraint on n, and Finite State Machines have to be, well, finite. No matter how many states I add to this machine, someone can give me an input where n equals the number of states plus one and my machine will fail. So if there can be a machine built to read this language, there must be a loop somewhere in there to keep the number of states finite. With these loops added: <img src="https://i.stack.imgur.com/bSNTe.jpg" alt="FSM 2"> all of the strings in our language will be accepted, but there is a problem. After the first four <code>a</code>s, the machine loses count of how many <code>a</code>s have been input because it stays in the same state. That means that after four, I can add as many <code>a</code>s as I want to the string, without adding any <code>b</code>s, and still get the same return value. This means that the strings: <pre class="prettyprint"><code>aaaa(a*)bbbb </code></pre> with <code>(a*)</code> representing any number of <code>a</code>s, will all be accepted by the machine even though they obviously aren't all in the language. In this context, we would say that the part of the string <code>(a*)</code> can be pumped. The fact that the Finite State Machine is finite and n is not bounded, guarantees that any machine which accepts all strings in the language MUST have this property. The machine must loop at some point, and at the point that it loops the language can be pumped. Therefore no Finite State Machine can be built for this language, and the language is not regular. Remember that Regular Expressions and Finite State Machines are equivalent, then replace <code>a</code> and <code>b</code> with opening and closing Html tags which can be embedded within each other, and you can see why it is not possible to use regular expressions to parse Html

What is the Pumping Lemma in Layman's terms?

2 Answers

The pumping lemma is a simple proof to show that a language is not regular, meaning that a Finite State Machine cannot be built for it. The canonical example is the language (a^n)(b^n). This is the simple language which is just any number of as, followed by the same number of bs. So the strings

ab aabb aaabbb aaaabbbb

etc. are in the language, but

aab bab aaabbbbbb

etc. are not.

It's simple enough to build a FSM for these examples:

FSM

This one will work all the way up to n=4. The problem is that our language didn't put any constraint on n, and Finite State Machines have to be, well, finite. No matter how many states I add to this machine, someone can give me an input where n equals the number of states plus one and my machine will fail. So if there can be a machine built to read this language, there must be a loop somewhere in there to keep the number of states finite. With these loops added:

FSM 2

all of the strings in our language will be accepted, but there is a problem. After the first four as, the machine loses count of how many as have been input because it stays in the same state. That means that after four, I can add as many as as I want to the string, without adding any bs, and still get the same return value. This means that the strings:

aaaa(a*)bbbb

with (a*) representing any number of as, will all be accepted by the machine even though they obviously aren't all in the language. In this context, we would say that the part of the string (a*) can be pumped. The fact that the Finite State Machine is finite and n is not bounded, guarantees that any machine which accepts all strings in the language MUST have this property. The machine must loop at some point, and at the point that it loops the language can be pumped. Therefore no Finite State Machine can be built for this language, and the language is not regular.

Remember that Regular Expressions and Finite State Machines are equivalent, then replace a and b with opening and closing Html tags which can be embedded within each other, and you can see why it is not possible to use regular expressions to parse Html

125

answered Oct 25 '22 03:10

Graphics Noob

It's a device intended to prove that a given language cannot be of a certain class.

Let's consider the language of balanced parentheses (meaning symbols '(' and ')', and including all strings that are balanced in the usual meaning, and none that aren't). We can use the pumping lemma to show this isn't regular.

(A language is a set of possible strings. A parser is some sort of mechanism we can use to see if a string is in the language, so it has to be able to tell the difference between a string in the language or a string outside the language. A language is "regular" (or "context-free" or "context-sensitive" or whatever) if there is a regular (or whatever) parser that can recognize it, distinguishing between strings in the language and strings not in the language.)

LFSR Consulting has provided a good description. We can draw a parser for a regular language as a finite collection of boxes and arrows, with the arrows representing characters and the boxes connecting them (acting as "states"). (If it's more complicated than that, it isn't a regular language.) If we can get a string longer than the number of boxes, it means we went through one box more than once. That means we had a loop, and we can go through the loop as many times as we want.

Therefore, for a regular language, if we can create an arbitrarily long string, we can divide it into xyz, where x is the characters we need to get to the start of the loop, y is the actual loop, and z is whatever we need to make the string valid after the loop. The important thing is that the total lengths of x and y are limited. After all, if the length is greater than the number of boxes, we've obviously gone through another box while doing this, and so there's a loop.

So, in our balanced language, we can start by writing any number of left parentheses. In particular, for any given parser, we can write more left parens than there are boxes, and so the parser can't tell how many left parens there are. Therefore, x is some amount of left parens, and this is fixed. y is also some number of left parens, and this can increase indefinitely. We can say that z is some number of right parens.

This means that we might have a string of 43 left parens and 43 right parens recognized by our parser, but the parser can't tell that from a string of 44 left parens and 43 right parens, which isn't in our language, so the parser can't parse our language.

Since any possible regular parser has a fixed number of boxes, we can always write more left parens than that, and by the pumping lemma we can then add more left parens in a way that the parser can't tell. Therefore, the balanced parenthesis language can't be parsed by a regular parser, and therefore isn't a regular expression.

answered Oct 25 '22 02:10

David Thornley

Related questions
                            
                                Exactly what is the difference between a "closure" and a "block"?
                            
                                What's a Turing machine?
                            
                                Smoothing values over time: moving average or something better?
                            
                                Using IS NULL or IS NOT NULL on join conditions - Theory question
                            
                                Text editor theory [closed]
                            
                                What is the Zipper data structure and should I be using it?
                            
                                Why is number of bits always(?) a power of two? [closed]
                            
                                Purpose of singletons in programming
                            
                                What are the main differences between the Knuth-Morris-Pratt and Boyer-Moore search algorithms?
                            
                                Could a truly random number be generated using pings to pseudo-randomly selected IP addresses?
                            
                                Why can't programs be proven?
                            
                                Why can Conway’s Game of Life be classified as a universal machine?
                            
                                Is the time complexity of the empty algorithm O(0)?
                            
                                When have you come upon the halting problem in the field? [closed]
                            
                                How are hash functions like MD5 unique?
                            
                                Coupling and cohesion
                            
                                How to program a fractal?
                            
                                Is the C99 preprocessor Turing complete?
                            
                                "const correctness" in C#
                            
                                The recognizing power of "modern" regexes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

What is the Pumping Lemma in Layman's terms?

Tags:

theory

proof

pumping-lemma

shsteimer

People also ask

2 Answers

Graphics Noob

David Thornley

Recent Activity

Donate For Us