I'm trying to find a regex that matches URLs with three or more repeating segments (and may include any number of directories) such as:
s1 = 'http://www.foo.com/bar/bar/bar/'s2 = 'http://www.foo.com/baz/biz/baz/biz/baz/biz/etc's3 = '/foo/bar/foo/bar/foo/bar/'and not match URLs like:
s4 = '/foo/bar/foo/bar/foo/barbaz'First I tried:
re1 = /((.+\/)+)\1\1/
which works:
re1 === s1 #=> true
re1 === s2 #=> true
but as the number of segments grows, the regex match takes exponentially longer:
require 'benchmark'
Benchmark.bm do |b|
(10..15).each do |num|
str = '/foo/bar' * num
puts str
b.report("#{num} repeats:") { /((.+\/)+)\1\1/ === str }
end
end
user system total real
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
10 repeats: 0.060000 0.000000 0.060000 ( 0.054839)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
11 repeats: 0.210000 0.000000 0.210000 ( 0.213492)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
12 repeats: 0.870000 0.000000 0.870000 ( 0.871879)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
13 repeats: 3.370000 0.010000 3.380000 ( 3.399224)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
14 repeats: 13.580000 0.110000 13.690000 ( 13.790675)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
15 repeats: 54.090000 0.210000 54.300000 ( 54.562672)
Then, I tried a regex similar to one given here:
re2 = /(\/.+)(?=.*\1)\1\1/
which doesn't have performance issues, and matches the strings I'd like to to match:
re2 === s3 #=> true
but also matches strings I don't want it to match, such as:
re2 === s4 #=> true, but should be false
I'm close with the second regex. What am I missing?
Change . to [^\/]. This should decrease the complexity of the regex since it won't be trying to match "any" character.
require 'benchmark'
Benchmark.bm do |b|
(10..15).each do |num|
str = '/foo/bar' * num
puts str
b.report("#{num} repeats:") { /(([^\/]+\/)+)\1\1/ === str }
end
end
10 repeats: 0.000000 0.000000 0.000000 ( 0.000015)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
11 repeats: 0.000000 0.000000 0.000000 ( 0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
12 repeats: 0.000000 0.000000 0.000000 ( 0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
13 repeats: 0.000000 0.000000 0.000000 ( 0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
14 repeats: 0.000000 0.000000 0.000000 ( 0.000004)
/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar/foo/bar
15 repeats: 0.000000 0.000000 0.000000 ( 0.000005)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With