Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Detecting Japanese characters in Java strings

Tags:

java

string

regex

I am trying to detect if a java string contains Japanese characters. Since it does not matter to me if the characters form a grammatically correct sentence I thought I'd use a regex to match any Japanese character in the string like so:

package de.cg.javatest;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class JavaTest {

    public static void main(String[] args) {
        String aString = "なにげない日々。";
        Pattern pat = Pattern.compile("[\\p{InHiragana}]");
        Matcher m = pat.matcher(aString);
        System.out.println(m.matches()); // false
    }
}

However, the print statement always shows false. I have tried altering the pattern to

[\\p{IsHiragana}]
[\\p{InHiragana}]+

and I have also entered the code points manually. Is there something I am missing, or do I have to take another approach?

like image 981
CannibalGorilla Avatar asked Sep 27 '14 14:09

CannibalGorilla


1 Answers

Matcher.matches returns true only when the pattern matches the whole string. As Anonymous commented, not all character are Hiragana characters.

By changing the pattern as following, you can check if there's any Hiragana.

Pattern pat = Pattern.compile(".*\\p{InHiragana}.*");

By using Matcher.find, you don't need to modify the pattern.

Pattern pat = Pattern.compile("\\p{InHiragana}");  // [..] is not needed.
Matcher m = pat.matcher(aString);
System.out.println(m.find()); // true
like image 110
falsetru Avatar answered Sep 30 '22 16:09

falsetru