Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Infer regex pattern from set of Strings, I need an algorithm in java to create below information [duplicate]

I wanted to convert sets of strings to regular expression using java.

I searched many things for it but there was no such satisfying answer available on the internet which resolves my issue. so I prefer to ask here.

First is it possible to convert it if yes, then kindly suggest me the way to get rid of this issue I'm facing?

Let's suppose I have sets of strings

abb
abababb
babb
aabb
bbbbabb
...

and I want to make a regular expression for it such as

(a+b)*abb

how it can be possible?

like image 446
Sabaoon Bedar Avatar asked Jan 26 '26 15:01

Sabaoon Bedar


2 Answers

If you have a collection of strings, and want to build a regex that matches any of those strings, you should build a regex that uses the | OR pattern.

Since the strings could contain regex special characters, they need to be quoted.

To make sure the best string matches, you need to match longest string first. E.g. if aba and abax are both on the list, and text to scan contains abax, we'd want to match on the second string, not the first one.

So, you can do it like this:

public static String toRegex(Iterable<String> strings) {
    return StreamSupport.stream(strings.spliterator(), false)
            .sorted(Comparator.comparingInt(String::length).reversed())
            .map(Pattern::quote)
            .collect(Collectors.joining("|"));
}
like image 127
Andreas Avatar answered Jan 28 '26 05:01

Andreas


What you are looking for is a way to infer a regular expression from a set of examples. This is a non-trivial computing problem to solve for the general case. See this post for details.


Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!