Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to terminate Matcher.find(), when its running too long?

Wondering about techniques for terminating long running regular expression matches (java matcher.find() method). Maybe subclassing Matcher and adding some logic to terminate after x number of iterations?

Basically I'm generating regular expressions using a genetic algorithm, so I don't have a lot of control over them. Then I test each one against some text to see if they match a certain target area of the text.

So since I'm sort of randomly generating these regular expressions, I get some crazy stuff going on, and it eats a ton of cpu and some find() calls take a while to terminate. I'd rather just kill them after a while, but not sure of best way to do that.

So if anyone has ideas, please let me know.

like image 820
Fraggle Avatar asked Aug 19 '11 18:08

Fraggle


1 Answers

There is a solution here which would solve your problem. (That question is the same problem yours is.)

Essentially, its a CharSequence that can notice thread interrupts.

The code from that answer:

/**
 * CharSequence that noticed thread interrupts -- as might be necessary 
 * to recover from a loose regex on unexpected challenging input. 
 * 
 * @author gojomo
 */
public class InterruptibleCharSequence implements CharSequence {
    CharSequence inner;
    // public long counter = 0; 

    public InterruptibleCharSequence(CharSequence inner) {
        super();
        this.inner = inner;
    }

    public char charAt(int index) {
        if (Thread.interrupted()) { // clears flag if set
            throw new RuntimeException(new InterruptedException());
        }
        // counter++;
        return inner.charAt(index);
    }

    public int length() {
        return inner.length();
    }

    public CharSequence subSequence(int start, int end) {
        return new InterruptibleCharSequence(inner.subSequence(start, end));
    }

    @Override
    public String toString() {
        return inner.toString();
    }
}

Wrap your string with this and you can interrupt the thread.

like image 51
Reverend Gonzo Avatar answered Sep 28 '22 06:09

Reverend Gonzo