My string looks like this:
"Chitkara DK, Rawat DJY, Talley N. The epidemiology of childhood recurrent abdominal pain in Western countries: a systematic review. Am J Gastroenterol. 2005;100(8):1868-75. DOI."
What I want is to get letters in uppercase (as separate words only) until first dot, to get: DK DJY N
. But not other characters after, like J DOI
.
Here`s my part of code for Java class Pattern:
\\b[A-Z]{1,3}\\b
Is there a general option in regex to stop matching after certain character?
[] denotes a character class. () denotes a capturing group. [a-z0-9] -- One character that is in the range of a-z OR 0-9. (a-z0-9) -- Explicit capture of a-z0-9 .
Backslashes in Java. The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.
To match a character having special meaning in regex, you need to use a escape sequence prefix with a backslash ( \ ). E.g., \. matches "." ; regex \+ matches "+" ; and regex \( matches "(" . You also need to use regex \\ to match "\" (back-slash).
Difference between matches() and find() in Java RegexThe matches() method returns true If the regular expression matches the whole text. If not, the matches() method returns false. Whereas find() search for the occurrence of the regular expression passes to Pattern.
You can make use of the contionous matching using \G
and extract your desired matches from the first capturing group:
(?:\\G|^)[^.]+?\\b([A-Z]{1,3})\\b
You need to use the MULTILINE
flag to use this in a multiline context. If your content is always a single line you may drop the |^
from your pattern.
See https://regex101.com/r/JXIu21/3
Note that regex101 uses a PCRE pattern, but all features used are also available in Java regex.
Sebastian Proske's answer is great, but it's often easier (and more readable) to split complex parsing tasks into separate steps. We can split your goal into two separate steps and thereby create a much simpler and more clearly-correct solution, using your original pattern.
private static final Pattern UPPER_CASE_ABBV_PATTERN = Pattern.compile("\\b[A-Z]{1,3}\\b");
public static List<String> getAbbreviationsInFirstSentence(String input) {
// isolate the first sentence, since that's all we care about
String firstSentence = input.split("\\.")[0];
// then look for matches in the first sentence
Matcher m = UPPER_CASE_ABBV_PATTERN.matcher(firstSentence);
List<String> results = new ArrayList<>();
while (m.find()) {
results.add(m.group());
}
return results;
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With