How do I match unicode characters in Java

Tags:

I m trying to match unicode characters in Java.

Input String: informa

String to match : informátion

So far I ve tried this:

Pattern p= Pattern.compile("informa[\u0000-\uffff].*", (Pattern.UNICODE_CASE|Pattern.CANON_EQ|Pattern.CASE_INSENSITIVE));
    String s = "informátion";
    Matcher m = p.matcher(s);
    if(m.matches()){
        System.out.println("Match!");
    }else{
        System.out.println("No match");
    }

It comes out as "No match". Any ideas?

770

asked Jun 23 '10 16:06

ankimal

1 Answers

The term "Unicode characters" is not specific enough. It would match every character which is in the Unicode range, thus also "normal" characters. This term is however very often used when one actually means "characters which are not in the printable ASCII range".

In regex terms that would be [^\x20-\x7E].

Click to copy

boolean containsNonPrintableASCIIChars = string.matches(".*[^\\x20-\\x7E].*");

Depending on what you'd like to do with this information, here are some useful follow-up answers:

Get rid of special characters
Get rid of diacritical marks

141

answered Sep 23 '22 12:09

BalusC

Related questions
                            
                                How to unit test that ExecutorService spawns new thread for task?
                            
                                Is it possible to have more than 32 locks in ConcurrentHashMap
                            
                                understanding class diagram
                            
                                Syntax Highlighter for Java
                            
                                Can I write Java code in XCode 3.2.1?
                            
                                Pass a command line argument to JAR in an Ant script
                            
                                Hibernate Criteria / Query on object properties
                            
                                Numerical Java Libraries [closed]
                            
                                Java HttpURLConnection doesn't connect when I call connect()
                            
                                Create Eclipse Project from Android Git
                            
                                Java generics and JNI
                            
                                simpleJdbcTemplate. - insert and retrieve ID
                            
                                calling invokeAndWait from the EDT
                            
                                Generating equals / hashcode / toString using annotation
                            
                                How to access a superclass method from a nested class?
                            
                                In Java how instance of and type cast(i.e (ClassName)) works on proxy object?
                            
                                What is an "import" called?
                            
                                Linux command to find the which are the jars loaded by the jvm
                            
                                for (Object object : list) [java] and index element
                            
                                Multiple row insert in SQL Server from Java [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How do I match unicode characters in Java

Tags:

java

regex

unicode

ankimal

People also ask

1 Answers

BalusC

Recent Activity

Donate For Us