Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding a pattern in a set of values in Java

Is there a way to extract a common pattern in a list of strings in Java?

For example, if we have a list of values:

001-L1
002-L2
003-L3
004-L4
...

Is there a way to deduce that we have 3digits, followed by '-', then a letter L and finally a numerical character?

I think it has something to do with common substrings or something like that but I haven't been able to find anything yet.

Thank you!

EDIT: Obviously it won't be a perfect recognition, it'll just return a recommendation based on the data.

What I'm trying to build is something close to this. In the video, when the user clicks on the column, there's a recommendation to split the data on ":".

like image 610
Raphael Khoury Avatar asked Nov 23 '16 08:11

Raphael Khoury


People also ask

How do you find a pattern in the given string Java?

Java provides the java. util. regex package for pattern matching with regular expressions. You can then search for a pattern in a Java string using classes and methods of this packages.

How does pattern and matcher work in Java?

Matcher pattern() method in Java with ExamplesThe pattern() method of Matcher Class is used to get the pattern to be matched by this matcher. Parameters: This method do not accepts any parameter. Return Value: This method returns a Pattern which is the pattern to be matched by this Matcher.

What is pattern matching in Java?

Pattern matching involves testing whether an object has a particular structure, then extracting data from that object if there's a match.

How do you find a string pattern?

To check if a String matches a Pattern one should perform the following steps: Compile a String regular expression to a Pattern, using compile(String regex) API method of Pattern. Use matcher(CharSequence input) API method of Pattern to create a Matcher that will match the given String input against this pattern.


1 Answers

I think you may want to "deduce" the pattern that a set of strings might have in common, and not validate them using regex. This problem may belong to pattern recognition.

  • You can apply the Longest Common Substring (not Longest Common Subsequence) algorithm on any two of your strings, first. Note that according to your list of strings, you may get two longest common substrings 00 and -L, so you need to take care of it.
  • Then, when you get a common substring as a result, simply use the contains() method to check for the pattern in the other strings.

This method works well only when the common pattern between the strings is at least a few characters.

EDIT:

If you want to implement something like in the given video, you just need to split the strings based on a certain delimiter. An easy and naive approach:

  • Create a list of possible delimiters, like :,.,-,,,:: etc.
  • Search all your strings for the occurrence of a certain delimiter. The LCS algorithm would not work as the strings might have common data values (like "Yes" and "No" as in the video) which are not intended as a delimiter.
  • split the strings based on the delimiter, if it is found in all (or even most) of the strings!

There might be more optimal solutions than this one!

like image 151
skrtbhtngr Avatar answered Oct 21 '22 13:10

skrtbhtngr