I'm trying to understand this RegEx statement in details. It's supposed to validate filename from ASP.Net FileUpload control to allow only jpeg and gif files. It was designed by somebody else and I do not completely understand it. It works fine in Internet Explorer 7.0 but not in Firefox 3.6.
<asp:RegularExpressionValidator id="FileUpLoadValidator" runat="server"
ErrorMessage="Upload Jpegs and Gifs only."
ValidationExpression="^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$"
ControlToValidate="LogoFileUpload">
</asp:RegularExpressionValidator>
Here's a short explanation:
^ # match the beginning of the input
( # start capture group 1
( # start capture group 2
[a-zA-Z] # match any character from the set {'A'..'Z', 'a'..'z'}
: # match the character ':'
) # end capture group 2
| # OR
( # start capture group 3
\\{2} # match the character '\' and repeat it exactly 2 times
\w+ # match a word character: [a-zA-Z_0-9] and repeat it one or more times
) # end capture group 3
\$? # match the character '$' and match it once or none at all
) # end capture group 1
( # start capture group 4
\\ # match the character '\'
( # start capture group 5
\w # match a word character: [a-zA-Z_0-9]
[\w] # match any character from the set {'0'..'9', 'A'..'Z', '_', 'a'..'z'}
.* # match any character except line breaks and repeat it zero or more times
) # end capture group 5
) # end capture group 4
( # start capture group 6
. # match any character except line breaks
jpg # match the characters 'jpg'
| # OR
. # match any character except line breaks
JPG # match the characters 'JPG'
| # OR
. # match any character except line breaks
gif # match the characters 'gif'
| # OR
. # match any character except line breaks
GIF # match the characters 'GIF'
) # end capture group 6
$ # match the end of the input
EDIT
As some of the comments request, the above is generated by a little tool I wrote. You can download is here: http://www.big-o.nl/apps/pcreparser/pcre/PCREParser.html (WARNING: heavily under development!)
EDIT 2
It will match strings like these:
x:\abc\def\ghi.JPG
c:\foo\bar.gif
\\foo$\baz.jpg
Here's what the groups 1, 4 and 6 match individually:
group 1 | group 4 | group 6
--------+--------------+--------
| |
x: | \abc\def\ghi | .JPG
| |
c: | \foo\bar | .gif
| |
\\foo$ | \baz | .jpg
| |
Note that it also matches a string like c:\foo\bar@gif
since the DOT matches any character (except line breaks). And it will reject a string like c:\foo\bar.Gif
(capital G
in gif
).
This is a bad regex.
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$
Let's do it part by part.
([a-zA-Z]:)
This requires the file path starts with a driveletter like C:
, d:
, etc.
(\\{2}\w+)\$?)
\\{2}
means the backslash repeated twice (note the \
needs to be escaped), followed by some alphanumerics (\w+
), and then maybe a dollar sign (\$?
). This is the host part of UNC path.
([a-zA-Z]:)|(\\{2}\w+)\$?)
The |
means "or". So either starts with a drive letter or an UNC path. Congratulations for kicking out non-Windows users.
(\\(\w[\w].*))
This should the directory part of the path, but actually is 2 alphanumerics followed by anything except new lines (.*
), like \ab!@#*(#$*)
.
The proper regex for this part should be (?:\\\w+)+
(.jpg|.JPG|.gif|.GIF)$
This means the last 3 characters of the path must be jpg
, JPG
, gif
or GIF
. Note that .
is not a dot, but matches anything except \n
, so a filename like haha.abcgif
or malicious.exe\0gif
will pass.
The proper regex for this part should be \.(?:jpg|JPG|gif|GIF)$
Together,
^(([a-zA-Z]:)|(\\{2}\w+)\$?)(\\(\w[\w].*))(.jpg|.JPG|.gif|.GIF)$
will match
D:\foo.jpg
\\remote$\dummy\..\C:\Windows\System32\Logo.gif
C:\Windows\System32\cmd.exe;--gif
and will fail
/home/user/pictures/myself.jpg
C:\a.jpg
C:\d\e.jpg
The proper regex is /\.(?:jpg|gif)$/i
, and check whether the uploaded file is really an image on the server side.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With