Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Converting wildcard pattern to regular expression

Tags:

c#

regex

wildcard

I am new to regular expressions. Recently I was presented with a task to convert a wildcard pattern to regular expression. This will be used to check if a file path matches the regex.

For example if my pattern is *.jpg;*.png;*.bmp

I was able to generate the regex by spliting on semicolons, escaping the string and replaceing the escaped * with .*

String regex = "((?i)" + Regex.Escape(extension).Replace("\\*", ".*") + "$)";

So my resulting regex will be for jpg ((?i).*\.jpg)$) Thien I combine all my extensions using the OR operator.

Thus my final expression for this example will be:

((?i).*\.jpg)$)|((?i).*\.png)$)|((?i).*\.bmp)$)

I have tested it and it worked yet I am not sure if I should add or remove any expression to cover other cases or is there a better format the whole thing

Also bear in mind that I can encounter a wildcard like *myfile.jpg where it should match all files whose names end with myfile.jpg

I can encounter patterns like *myfile.jpg;*.png;*.bmp

like image 275
Zaid Amir Avatar asked Oct 02 '12 19:10

Zaid Amir


People also ask

How do you do a wildcard in regex?

To use 'regexp' syntax, you must use a / (slash) at the beginning and end of the regexp. To wildcard text at the beginning and ending of the line, add a ". *" (dot asterisk) to the beginning and end of the 'Actor Command Line'. Example: Stop recording "ETW 8015 Activity" events from a trusted script.

What are main differences between wildcards and regular expressions?

wildcard matches any single character except newline. \w expression matches word characters. \d expression matches digits, equivalent to [0-9]. \\ matches the \ character itself.

How do I match a pattern in regex?

Most characters, including all letters ( a-z and A-Z ) and digits ( 0-9 ), match itself. For example, the regex x matches substring "x" ; z matches "z" ; and 9 matches "9" . Non-alphanumeric characters without special meaning in regex also matches itself. For example, = matches "=" ; @ matches "@" .

What can be matched using (*) in a regular expression?

You can repeat expressions with an asterisk or plus sign. A regular expression followed by an asterisk ( * ) matches zero or more occurrences of the regular expression. If there is any choice, the first matching string in a line is used.


1 Answers

There's a lot of grouping going on there which isn't really needed... well unless there's something you haven't mentioned this regex would do the same for less:

 /.*\.(jpg|png|bmp)$/i

That's in regex notation, in C# that would be:

String regex=new RegEx(@".*\.(jpg|png|bmp)$",RegexOptions.IgnoreCase);

If you have to programatically translate between the two, you've started on the right track - split by semicolon, group your extensions into the set (without the preceding dot). If your wildcard patterns can be more complicated (extensions with wildcards, multi-wildcard starting matches) it might need a bit more work ;)

Edit: (For your update)

If the wild cards can be more complicated, then you're almost there. There's an optimization in my above code that pulls the dot out (for extension) which has to be put back in so you'd end up with:

 /.*(myfile\.jpg|\.png|\.bmp)$/i

Basically '*' -> '.*', '.' -> '\.'(gets escaped), rest goes into the set. Basically it says match anything ending (the dollar sign anchors to the end) in myfile.jpg, .png or .bmp.

like image 113
Rudu Avatar answered Sep 30 '22 09:09

Rudu