Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does checking this string with Regex.IsMatch cause CPU to reach 100%?

Tags:

c#

.net

regex

When using Regex.IsMatch (C#, .Net 4.5) on a specific string, the CPU reaches 100%.

String:

https://www.facebook.com/CashKingPirates/photos/a.197028616990372.62904.196982426994991/1186500984709792/?type=1&permPage=1

Pattern:

^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$

Full code:

Regex.IsMatch("https://www.facebook.com/CashKingPirates/photos/a.197028616990372.62904.196982426994991/1186500984709792/?type=1&permPage=1",
                @"^http(s)?://([\w-]+.)+[\w-]+(/[\w- ./?%&=])?$");

I found that redacting the URL prevents this problem. Redacted URL:

https://www.facebook.com/CashKingPirates/photos/a.197028616990372.62904.196982426994991/1186500984709792

But still very interested in understanding what causes this.

like image 207
Nir Avatar asked Jul 05 '15 06:07

Nir


People also ask

How does regex IsMatch work?

IsMatch(String, String, RegexOptions, TimeSpan) Indicates whether the specified regular expression finds a match in the specified input string, using the specified matching options and time-out interval.

Which is faster regex or string contains?

Contains will almost certainly be faster and use less memory. And in of course, there's no reason to use regex here. Save this answer.

Is regex faster than IndexOf?

IndexOf is probably faster than regex in most cases. Especially if you don't use a precompiled regex. It's performance might also depend on the chosen string comparison/culture.

How do you check if a string contains a pattern in C#?

Contains() Method. In C#, String. Contains() is a string method. This method is used to check whether the substring occurs within a given string or not.


1 Answers

As nu11p01n73R pointed out, you have a lot backtracking with your regular expression. That’s because parts of your expression can all match the same thing, which gives the engine many choices it has to try before finding a result.

You can avoid this by changing the regular expression to make individual sections more specific. In your case, the cause is that you wanted to match a real dot but used the match-all character . instead. You should escape that to \..

This should already reduce the backtracking need a lot and make it fast:

^http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=])?$

And if you want to actually match the original string, you need to add a quantifier to the character class at the end:

^http(s)?://([\w-]+\.)+[\w-]+(/[\w- ./?%&=]+)?$
                                           ↑
like image 187
poke Avatar answered Sep 27 '22 20:09

poke