Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

what is regular expression?

Tags:

regex

parsing

I know this question seems stupid, but it isn't. I mean what is it exactly. I have a fair understanding of the parsing problem. I know BNF/EBNF, I've written grammar to parse simple context-free languages in one of my college courses. I just never met regular expressions before! The only thing that I remember about it is that context-free grammar can do all what regular expression can do.

Also, is it useful for a usual coding to parse strings? A simple example would be helpful.

like image 895
Khaled Alshaya Avatar asked Jul 06 '09 04:07

Khaled Alshaya


People also ask

What is the regular expression?

A Regular Expression (or Regex) is a pattern (or filter) that describes a set of strings that matches the pattern. In other words, a regex accepts a certain set of strings and rejects the rest.

What is regular expression with example?

Solution: As we know, any number of a's means a* any number of b's means b*, any number of c's means c*. Since as given in problem statement, b's appear after a's and c's appear after b's. So the regular expression could be: R = a* b* c*

What is the use of regular expression?

Regular expressions are particularly useful for defining filters. Regular expressions contain a series of characters that define a pattern of text to be matched—to make a filter more specialized, or general. For example, the regular expression ^AL[.]* searches for all items beginning with AL.

Why is it called regular expression?

The term regular expression comes from mathematics and computer science theory, where it reflects a trait of mathematical expressions called regularity. The text patterns used by the earliest grep tools were regular expressions in the mathematical sense.


1 Answers

Regular expressions first came around in mathematics and automata theory. A regular expression is simply something which defines a regular language. Without going too much into what "regular" means, think of a language as this way:

  1. A language is made up of strings. English is a language, for example, and its made of strings.
  2. Those strings are made of symbols - called an alphabet. So a string is just a concatenation of symbols from the alphabet.

So you could have a string (which is, remember, just a concatenation of symbols) which is not part of a given language. Or it could be in the language.

So lets say you have an alphabet made of 2 symbols: "0" and "1". And lets say you want to create a language using the symbols in that alphabet. You could create the following rule: "In order for a string to be in my language, it must have only 0's and 1's in it."

So these strings are in your language:

  • 0
  • 1
  • 01
  • 11001101
  • ...etc

These would not be in your language:

  • 2
  • peaches
  • 00101105

That's a pretty simple language. How about this: "In my language, each string [analogous to a valid 'word' in English] must being with a 0, and then can be followed by any number of 0's or 1's"

These are in the language:

  • 0111111
  • 0000000
  • 0101010110001

These are not:

  • 1
  • 10000
  • 1010
  • 2000000

Well rather than defining the language using words - and these languages might get very complex ("1 followed by 2 0's followed by any combination of 1's and 0's ending with a 1"), we came up with this syntax called "regular expressions" to define the language.

The first language would have been:

(0|1)*

(0 or 1, repeated infinitely)

The next: 0(0|1)*

(0, followed by any number of 0's and 1's).

So lets think of programming now. When you create a regex, you are saying "Look at this text. Return to me strings which match this pattern." Which is really saying "I have defined a language. Return to me all strings within this document which are in my language."

So when you create a "regex", you are actually defining a regular language, which is a mathematical concept. (In actuality, perl-like regex define "nonregular" languages, but that is a separate issue.)

By learning the syntax of regex, you are learning the ins and outs of how to create a language, so that later you can see if a given string is "in" the language. Thus, commonly, people say that regex are for pattern matching - which is basically what you are doing when you look at a pattern, and see if it "matches" the rules for your language.

(this was long. does it answer your question at all?)

like image 156
poundifdef Avatar answered Oct 18 '22 09:10

poundifdef