Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

use of colon symbol in regular expression

Tags:

regex

I am new to regex. I am studying it in regularexperssion.com. The question is that I need to know what is the use of a colon (:) in regular expressions.

For example:

$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/'; 

which matches:

$url1  = "http://www.somewebsite.com"; $url2  = "https://www.somewebsite.com"; $url3  = "https://somewebsite.com"; $url4  = "www.somewebsite.com"; $url5  = "somewebsite.com"; 

Yeah, any help would be greatly appreciated.

like image 631
badu Avatar asked Jan 10 '14 13:01

badu


People also ask

Is colon special character regex?

Colon does not have special meaning in a character class and does not need to be escaped.

What does '$' mean in regex?

$ means "Match the end of the string" (the position after the last character in the string).

How do you use a semicolon in regular expressions?

Thus, if you use a semicolon (;) in a keyword expression, it will split the keywords into multiple parts. Semicolon is not in RegEx standard escape characters. It can be used normally in regular expressions, but it has a different function in HES so it cannot be used in expressions.


2 Answers

Colon : is simply colon. It means nothing, except special cases like, for example, clustering without capturing (also known as a non-capturing group):

(?:pattern) 

Also it can be used in character classes, for example:

[[:upper:]] 

However, in your case colon is just a colon.

Special characters used in your regex:

In character class [-+_~.\d\w]:

  • - means -
  • + means +
  • _ means _
  • ~ means ~
  • . means .
  • \d means any digit
  • \w means any word character

These symbols have this meaning because they are used in a symbol class []. Without symbol class + and . have special meaning.

Other elements:

  • =? means = that can occur 0 or 1 times; in other words = that can occur or not, optional =.
like image 79
Igor Chubin Avatar answered Sep 30 '22 02:09

Igor Chubin


I've decided to go you one better and explain the entire regex:

^                 # anchor to start of line (                 # start grouping  (                # start grouping   [\w]+           # at least one of 0-9a-zA-Z_   :               # a literal colon  )                # end grouping  ?                # this grouping is optional  \/\/             # two literal slashes )                 # end capture ?                 # this grouping is optional (  (   [\d\w]          # exactly one of 0-9a-zA-Z_                   # having \d is redundant   |               # alternation   %               # literal % sign   [a-fA-f\d]{2,2} # exactly 2 hexadecimal digits                   # should probably be A-F                   # using {2} would have sufficed  )+               # at least one of these groups  (                # start grouping   :               # literal colon   (    [\d\w]    |    %    [a-fA-f\d]{2,2}   )+  )?               # Same grouping, but it is optional                   # and there can be only one  @                # literal @ sign )?                # this group is optional (  [\d\w]           # same as [\w], explained above  [-\d\w]{0,253}   # includes a dash (-) as a valid character                   # between 0 and 253 of these characters  [\d\w]           # end with \w.  They want at most 255                   # total and - cannot be at the start                   # or end  \.               # literal period )+                # at least one of these groups [\w]{2,4}         # two to four \w characters (  :                # literal colon  [\d]+            # at least one digit )? (  \/               # literal slash  (   [-+_~.\d\w]    # one of these characters   |              # *or*   %              # % with two hex digit combo   [a-fA-f\d]{2,2}  )*              # zero or more of these groups )*               # zero or more of these groups (  \?              # literal question mark  (   &?         # literal &amp or & (semicolon optional)   (    [-+_~.\d\w]    |    %    [a-fA-f\d]{2,2}   )   =?             # optional literal =  )*              # zero or more of this group )?               # this group is optional (  #               # literal #  (   [-+_~.\d\w]   |   %   [a-fA-f\d]{2,2}  )* )? $                # anchor to end of line 

It's important to understand what the metacharacters/sequences are. Some sequences are not meta when used in certain contexts (especially a character class). I've cataloged them for you:

meta with no context

  • ^ -- zero width start of line
  • () -- grouping/capture
  • ? -- zero or one of the preceding sequence
  • + -- one or more of the preceding sequence
  • * -- zero or more of the preceding sequence
  • [] -- character class
  • \w -- alphanumeric characters and _. Opposite of \W
  • | -- alternation
  • {} -- length assertion
  • $ -- zero width end of line

This excludes :, @, and % from having any special/meta meaning in the raw context.

meta inside character class

] ends the character class. - creates a range of characters unless it is at the start or the end of the character class or escaped with a backslash.

grouping assertions

A (? combination starts a grouping assertion. For example, (?: means group but do not capture. This means that in the regex /(?:a)/, it will match the string "a", but a is not captured for use in replacement or match groups as it would be from /(a)/.

? can also be used for lookahead/lookbehind assertions with ?=, ?!, ?<=, ?<!. (? followed by any sequence except what I mentioned in this section is just a literal ?.

like image 30
Explosion Pills Avatar answered Sep 30 '22 01:09

Explosion Pills