what is regular expression?

Tags:

I know this question seems stupid, but it isn't. I mean what is it exactly. I have a fair understanding of the parsing problem. I know BNF/EBNF, I've written grammar to parse simple context-free languages in one of my college courses. I just never met regular expressions before! The only thing that I remember about it is that context-free grammar can do all what regular expression can do.

Also, is it useful for a usual coding to parse strings? A simple example would be helpful.

895

asked Jul 06 '09 04:07

Khaled Alshaya

1 Answers

Regular expressions first came around in mathematics and automata theory. A regular expression is simply something which defines a regular language. Without going too much into what "regular" means, think of a language as this way:

A language is made up of strings. English is a language, for example, and its made of strings.
Those strings are made of symbols - called an alphabet. So a string is just a concatenation of symbols from the alphabet.

So you could have a string (which is, remember, just a concatenation of symbols) which is not part of a given language. Or it could be in the language.

So lets say you have an alphabet made of 2 symbols: "0" and "1". And lets say you want to create a language using the symbols in that alphabet. You could create the following rule: "In order for a string to be in my language, it must have only 0's and 1's in it."

So these strings are in your language:

0
1
01
11001101
...etc

These would not be in your language:

2
peaches
00101105

That's a pretty simple language. How about this: "In my language, each string [analogous to a valid 'word' in English] must being with a 0, and then can be followed by any number of 0's or 1's"

These are in the language:

0111111
0000000
0101010110001

These are not:

1
10000
1010
2000000

Well rather than defining the language using words - and these languages might get very complex ("1 followed by 2 0's followed by any combination of 1's and 0's ending with a 1"), we came up with this syntax called "regular expressions" to define the language.

The first language would have been:

(0|1)*

(0 or 1, repeated infinitely)

The next: 0(0|1)*

(0, followed by any number of 0's and 1's).

So lets think of programming now. When you create a regex, you are saying "Look at this text. Return to me strings which match this pattern." Which is really saying "I have defined a language. Return to me all strings within this document which are in my language."

So when you create a "regex", you are actually defining a regular language, which is a mathematical concept. (In actuality, perl-like regex define "nonregular" languages, but that is a separate issue.)

By learning the syntax of regex, you are learning the ins and outs of how to create a language, so that later you can see if a given string is "in" the language. Thus, commonly, people say that regex are for pattern matching - which is basically what you are doing when you look at a pattern, and see if it "matches" the rules for your language.

(this was long. does it answer your question at all?)

156

answered Oct 18 '22 09:10

poundifdef

Related questions
                            
                                Extracting Data with Python Regular Expressions
                            
                                bash regex to match semantic version number
                            
                                Regex: Comma delimited integers
                            
                                PHP preg_replace three times with three different patterns? right or wrong?
                            
                                Is there a way to use \p{Punct} in a regex(java), but without the "(",")" characters?
                            
                                php regex to remove HTML
                            
                                Replace path in Powershell string
                            
                                How to check if a string is of a specific pattern [closed]
                            
                                how to sort strings in javascript numerically
                            
                                Regex to match Hebrew and English characters except numbers
                            
                                C# Regular Expressions, string between single quotes
                            
                                How can I extract a string between <strong> tags usings C#?
                            
                                Perl: Find and replace specific string in multiple text file
                            
                                Wildcard matching in Java
                            
                                Detect Russian / cyrillic in Javascript string?
                            
                                What does the "?:^" regular expression mean?
                            
                                Regex for anything between []
                            
                                Use regex to get image URL in HTML/Js
                            
                                Removing multiple delimiters between outside delimiters on each line
                            
                                Javascript regex .test() "Uncaught TypeError: undefined is not a function"

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

what is regular expression?

Tags:

regex

parsing

Khaled Alshaya

People also ask

1 Answers

poundifdef

Recent Activity

Donate For Us