I am new to regex. I am studying it in regularexperssion.com. The question is that I need to know what is the use of a colon (:) in regular expressions.
For example:
$pattern = '/^(([\w]+:)?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+(:([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}(:[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/';
which matches:
$url1 = "http://www.somewebsite.com"; $url2 = "https://www.somewebsite.com"; $url3 = "https://somewebsite.com"; $url4 = "www.somewebsite.com"; $url5 = "somewebsite.com";
Yeah, any help would be greatly appreciated.
Colon does not have special meaning in a character class and does not need to be escaped.
$ means "Match the end of the string" (the position after the last character in the string).
Thus, if you use a semicolon (;) in a keyword expression, it will split the keywords into multiple parts. Semicolon is not in RegEx standard escape characters. It can be used normally in regular expressions, but it has a different function in HES so it cannot be used in expressions.
Colon :
is simply colon. It means nothing, except special cases like, for example, clustering without capturing (also known as a non-capturing group):
(?:pattern)
Also it can be used in character classes, for example:
[[:upper:]]
However, in your case colon is just a colon.
Special characters used in your regex:
In character class [-+_~.\d\w]
:
-
means -
+
means +
_
means _
~
means ~
.
means .
\d
means any digit\w
means any word characterThese symbols have this meaning because they are used in a symbol class []
. Without symbol class +
and .
have special meaning.
Other elements:
=?
means =
that can occur 0 or 1 times; in other words =
that can occur or not, optional =
.I've decided to go you one better and explain the entire regex:
^ # anchor to start of line ( # start grouping ( # start grouping [\w]+ # at least one of 0-9a-zA-Z_ : # a literal colon ) # end grouping ? # this grouping is optional \/\/ # two literal slashes ) # end capture ? # this grouping is optional ( ( [\d\w] # exactly one of 0-9a-zA-Z_ # having \d is redundant | # alternation % # literal % sign [a-fA-f\d]{2,2} # exactly 2 hexadecimal digits # should probably be A-F # using {2} would have sufficed )+ # at least one of these groups ( # start grouping : # literal colon ( [\d\w] | % [a-fA-f\d]{2,2} )+ )? # Same grouping, but it is optional # and there can be only one @ # literal @ sign )? # this group is optional ( [\d\w] # same as [\w], explained above [-\d\w]{0,253} # includes a dash (-) as a valid character # between 0 and 253 of these characters [\d\w] # end with \w. They want at most 255 # total and - cannot be at the start # or end \. # literal period )+ # at least one of these groups [\w]{2,4} # two to four \w characters ( : # literal colon [\d]+ # at least one digit )? ( \/ # literal slash ( [-+_~.\d\w] # one of these characters | # *or* % # % with two hex digit combo [a-fA-f\d]{2,2} )* # zero or more of these groups )* # zero or more of these groups ( \? # literal question mark ( &? # literal & or & (semicolon optional) ( [-+_~.\d\w] | % [a-fA-f\d]{2,2} ) =? # optional literal = )* # zero or more of this group )? # this group is optional ( # # literal # ( [-+_~.\d\w] | % [a-fA-f\d]{2,2} )* )? $ # anchor to end of line
It's important to understand what the metacharacters/sequences are. Some sequences are not meta when used in certain contexts (especially a character class). I've cataloged them for you:
^
-- zero width start of line()
-- grouping/capture?
-- zero or one of the preceding sequence+
-- one or more of the preceding sequence*
-- zero or more of the preceding sequence[]
-- character class\w
-- alphanumeric characters and _
. Opposite of \W
|
-- alternation{}
-- length assertion$
-- zero width end of lineThis excludes :
, @
, and %
from having any special/meta meaning in the raw context.
]
ends the character class. -
creates a range of characters unless it is at the start or the end of the character class or escaped with a backslash.
A (?
combination starts a grouping assertion. For example, (?:
means group but do not capture. This means that in the regex /(?:a)/
, it will match the string "a"
, but a
is not captured for use in replacement or match groups as it would be from /(a)/
.
?
can also be used for lookahead/lookbehind assertions with ?=
, ?!
, ?<=
, ?<!
. (?
followed by any sequence except what I mentioned in this section is just a literal ?
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With