Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm (or regular expression) needed to find multiple instances of anything

I'm not sure if there is a simple way of doing this, but is there a way to find multiple instances in an unknown string? For example:

hellohellohellobyebyebyehello

Without knowing the value of the above string, can I return something that will tell me that there are 3 instances of "hello" and 3 instances of "bye" (I'm not worried about the last hello however as I'm looking for consecutive repetition. Thanks in advance!

like image 963
Copper Avatar asked Feb 24 '10 14:02

Copper


3 Answers

Maybe the Sequitur algorithm can help: http://sequitur.info/

like image 191
tur1ng Avatar answered Sep 22 '22 10:09

tur1ng


s = "hellohellohellobyebyebyehello"
s.replace(/(.+)(\1+)/g, function($0, $1) {
    console.log($1 + " repeated " + ($0.length / $1.length) + " times");
});
like image 38
user187291 Avatar answered Sep 26 '22 10:09

user187291


"testhellohellohellobyebyebyehello".match(/(.+)\1+/)

This says : "match a sequence of at least 1 character (.+), then reference that first thing we found \1 at least one time + or more.

It will return ["hellohellohello", "hello"] meaning hellohellohello matches the full expression (expression 0), and "hello" matches expression 1 (the thing we reference with \1).

Caveat:
something like "hahahaha" will yield ["hahahaha", "haha"], instead of ["hahahaha", "ha"]. so you'll need to use the above with some post-processing to get to your desired result.

like image 25
Mark Bolusmjak Avatar answered Sep 23 '22 10:09

Mark Bolusmjak