I need a matcher like this:
Matcher kuchen = Pattern .compile("gibt es Kuchen in der K\u00FCche", Pattern.CASE_INSENSITIVE) .matcher("");
and the problem is that it is not simple ASCII. I know that in this particular case I could use [\u00FC\u00DC]
for the ü, but I need to be a bit more general (building the regex from other matcher groups). So according to javadocs:
By default, case-insensitive matching assumes that only characters in the US-ASCII charset are being matched. Unicode-aware case-insensitive matching can be enabled by specifying the UNICODE_CASE flag in conjunction with this flag.
Can anybody tell me how to specify the two flags in conjunction?
The plus sign + is a greedy quantifier, which means one or more times. For example, expression X+ matches one or more X characters. Therefore, the regular expression \s matches a single whitespace character, while \s+ will match one or more whitespace characters.
The flags() method of the Pattern class in Java is used to return the pattern's match flags. The Match flags are a bit mask that may include CASE_INSENSITIVE, MULTILINE, DOTALL, UNICODE_CASE, CANON_EQ, UNIX_LINES, LITERAL, UNICODE_CHARACTER_CLASS and COMMENTS Flags. Syntax: public int flags()
Backslashes in Java. The backslash \ is an escape character in Java Strings. That means backslash has a predefined meaning in Java. You have to use double backslash \\ to define a single backslash. If you want to define \w , then you must be using \\w in your regex.
Pattern. MULTILINE or (? m) tells Java to accept the anchors ^ and $ to match at the start and end of each line (otherwise they only match at the start/end of the entire string).
Try
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE
it should solve the issue. Or-ing the bitmask you will get compound features.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With