I tried to find the answer to my problem in the questions history but they just come back in more than one thousand and after scanning through a few tens of matching answers I gave up. So here is my problem.
I want to be able to find the first sequence of exactly six digits in a string. Given the string “Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end” I want to find the regex that will match the 123456 sequence.
I am new to regex and a short explanation about how it works will help a lot.
Thank you in advance
You can use the pattern (?<!\d)\d{6}(?!\d)
, which means "a string-position that is not preceded by a digit; followed by exactly six digits; followed by a string-position that is not followed by a digit". (The notation (?<!...)
, known as a negative lookbehind assertion, means "not preceded by ...
". The notation (?!...)
, known as a negative lookahead assertion, means "not followed by ...
". The notation \d
means a digit. The notation {n}
means "n times", so that e.g. \d{6}
means "six digits".)
That could look like this:
final String number;
{
final Matcher m = Pattern.compile("(?<!\\d)\\d{6}(?!\\d)").matcher(input);
if(m.find())
number = m.group(); // retrieve the matched substring
else
number = null; // no match found
}
Note: a previous version of this answer suggested the use of word boundaries, \b
; but one of your comments suggests that the digits might be immediately preceded or followed by Traditional Chinese characters, which are considered word characters (and therefore wouldn't trigger a word boundary), so I've changed that.
The pattern you’re looking for is:
(?x) # enable comments
(?<! \p{Nd} ) # no decimal number before
\p{Nd} {6} # exactly six repetitions of a decimal number
(?!= \p{Nd} ) # no decimal number after
That will also pick up things like
U+FF10 0 FULLWIDTH DIGIT ZERO
U+FF11 1 FULLWIDTH DIGIT ONE
U+FF12 2 FULLWIDTH DIGIT TWO
U+FF13 3 FULLWIDTH DIGIT THREE
U+FF14 4 FULLWIDTH DIGIT FOUR
U+FF15 5 FULLWIDTH DIGIT FIVE
U+FF16 6 FULLWIDTH DIGIT SIX
U+FF17 7 FULLWIDTH DIGIT SEVEN
U+FF18 8 FULLWIDTH DIGIT EIGHT
U+FF19 9 FULLWIDTH DIGIT NINE
In case you have those in Chinese text.
The first occurrence of 6 digits in the string you posted is actually 987654
. If you mean the first occurrence of 6 digits surrounded by characters that are not digits, then this should work:
(?<!\d)(\d{6})(?!\d)
EDIT:
This approach uses a negative lookbehind and a negative lookahead. It's slightly different than the word boundary approach in that it will match 123456
in the following strings
123456asdf some text hello
another string a123456 aaaaaaaa
If the numbers will always be surrounded by spaces then the word boundary approach is probably better.
public static String splitting(String str, int num){
String arr[] = str.split("[^0-9]");
for(String s:arr)
if(s.length() == num)
return s;
return null;
}
test with
public static void main(String[] args) {
String s = "Some text 987654321 and some more text 123456 and some other text again 654321 and more text in the end";
System.out.println(splitting(s, 6));
}
output is
123456
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With