Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java REGEX to match an exact number of digits in a string

Tags:

java

regex

I tried to find the answer to my problem in the questions history but they just come back in more than one thousand and after scanning through a few tens of matching answers I gave up. So here is my problem.

I want to be able to find the first sequence of exactly six digits in a string. Given the string “Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end” I want to find the regex that will match the 123456 sequence.

I am new to regex and a short explanation about how it works will help a lot.

Thank you in advance

like image 895
Julian Avatar asked Mar 09 '12 02:03

Julian


4 Answers

You can use the pattern (?<!\d)\d{6}(?!\d), which means "a string-position that is not preceded by a digit; followed by exactly six digits; followed by a string-position that is not followed by a digit". (The notation (?<!...), known as a negative lookbehind assertion, means "not preceded by ...". The notation (?!...), known as a negative lookahead assertion, means "not followed by ...". The notation \d means a digit. The notation {n} means "n times", so that e.g. \d{6} means "six digits".)

That could look like this:

final String number;
{
    final Matcher m = Pattern.compile("(?<!\\d)\\d{6}(?!\\d)").matcher(input);
    if(m.find())
        number = m.group(); // retrieve the matched substring
    else
        number = null; // no match found
}

Note: a previous version of this answer suggested the use of word boundaries, \b; but one of your comments suggests that the digits might be immediately preceded or followed by Traditional Chinese characters, which are considered word characters (and therefore wouldn't trigger a word boundary), so I've changed that.

like image 88
ruakh Avatar answered Oct 24 '22 04:10

ruakh


The pattern you’re looking for is:

(?x)              # enable comments
(?<! \p{Nd} )     # no decimal number before
\p{Nd} {6}        # exactly six repetitions of a decimal number
(?!= \p{Nd} )     # no decimal number after

That will also pick up things like

U+FF10 ‭ 0 FULLWIDTH DIGIT ZERO
U+FF11 ‭ 1 FULLWIDTH DIGIT ONE
U+FF12 ‭ 2 FULLWIDTH DIGIT TWO
U+FF13 ‭ 3 FULLWIDTH DIGIT THREE
U+FF14 ‭ 4 FULLWIDTH DIGIT FOUR
U+FF15 ‭ 5 FULLWIDTH DIGIT FIVE
U+FF16 ‭ 6 FULLWIDTH DIGIT SIX
U+FF17 ‭ 7 FULLWIDTH DIGIT SEVEN
U+FF18 ‭ 8 FULLWIDTH DIGIT EIGHT
U+FF19 ‭ 9 FULLWIDTH DIGIT NINE

In case you have those in Chinese text.

like image 20
tchrist Avatar answered Oct 24 '22 03:10

tchrist


The first occurrence of 6 digits in the string you posted is actually 987654. If you mean the first occurrence of 6 digits surrounded by characters that are not digits, then this should work:

(?<!\d)(\d{6})(?!\d)

EDIT: This approach uses a negative lookbehind and a negative lookahead. It's slightly different than the word boundary approach in that it will match 123456 in the following strings

123456asdf some text hello

another string a123456 aaaaaaaa

If the numbers will always be surrounded by spaces then the word boundary approach is probably better.

like image 32
takteek Avatar answered Oct 24 '22 04:10

takteek


 public static String splitting(String str, int num){
    String arr[] = str.split("[^0-9]");
    for(String s:arr)
        if(s.length() == num)
            return s;
    return null;
}

test with

 public static void main(String[] args) {
    String s =  "Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end";
    System.out.println(splitting(s, 6));
}

output is

  123456
like image 33
kasavbere Avatar answered Oct 24 '22 04:10

kasavbere