I'm trying to write a regex pattern that will match any sentence that begins with multiple or one tab and/or whitespace. For example, I want my regex pattern to be able to match " hello there I like regex!" but so I'm scratching my head on how to match words after "hello". So far I have this:
String REGEX = "(?s)(\\p{Blank}+)([a-z][ ])*";
Pattern PATTERN = Pattern.compile(REGEX);
Matcher m = PATTERN.matcher(" asdsada adf adfah.");
if (m.matches()) {
System.out.println("hurray!");
}
Any help would be appreciated. Thanks.
^ matches the start of a new line. Allows the regex to match the phrase if it appears at the beginning of a line, with no characters before it.
$ means "Match the end of the string" (the position after the last character in the string). Both are called anchors and ensure that the entire string is matched instead of just a substring.
The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
An example regex to match sentences by the definition: "A sentence is a series of characters, starting with at lease one whitespace character, that ends in one of .
, !
or ?
" is as follows:
\s+[^.!?]*[.!?]
Note that newline characters will also be included in this match.
A sentence starts with a word boundary (hence \b
) and ends with one or more terminators. Thus:
\b[^.!?]+[.!?]+
https://regex101.com/r/7DdyM1/1
This gives pretty accurate results. However, it will not handle fractional numbers. E.g. This sentence will be interpreted as two sentences:
The value of PI is 3.141...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With