How can I get a Unicode character's code?

Tags:

Let's say I have this:

char registered = '®';

or an umlaut, or whatever unicode character. How could I get its code?

358

asked Jan 05 '10 14:01

Geo

2 Answers

Just convert it to int:

char registered = '®'; int code = (int) registered;

In fact there's an implicit conversion from char to int so you don't have to specify it explicitly as I've done above, but I would do so in this case to make it obvious what you're trying to do.

This will give the UTF-16 code unit - which is the same as the Unicode code point for any character defined in the Basic Multilingual Plane. (And only BMP characters can be represented as char values in Java.) As Andrzej Doyle's answer says, if you want the Unicode code point from an arbitrary string, use Character.codePointAt().

Once you've got the UTF-16 code unit or Unicode code points, both of which are integers, it's up to you what you do with them. If you want a string representation, you need to decide exactly what kind of representation you want. (For example, if you know the value will always be in the BMP, you might want a fixed 4-digit hex representation prefixed with U+, e.g. "U+0020" for space.) That's beyond the scope of this question though, as we don't know what the requirements are.

112

answered Sep 23 '22 08:09

Jon Skeet

A more complete, albeit more verbose, way of doing this would be to use the Character.codePointAt method. This will handle 'high surrogate' characters, that cannot be represented by a single integer within the range that a char can represent.

In the example you've given this is not strictly necessary - if the (Unicode) character can fit inside a single (Java) char (such as the registered local variable) then it must fall within the \u0000 to \uffff range, and you won't need to worry about surrogate pairs. But if you're looking at potentially higher code points, from within a String/char array, then calling this method is wise in order to cover the edge cases.

For example, instead of

String input = ...; char fifthChar = input.charAt(4); int codePoint = (int)fifthChar;

use

String input = ...; int codePoint = Character.codePointAt(input, 4);

Not only is this slightly less code in this instance, but it will handle detection of surrogate pairs for you.

answered Sep 24 '22 08:09

Andrzej Doyle

Related questions
                            
                                How Numeric literal with underscore works in java and why it was added as part of jdk 1.7 [duplicate]
                            
                                UnsupportedOperationException - Why can't you call toInstant() on a java.sql.Date?
                            
                                Should I avoid using Java Label Statements?
                            
                                How to compile using -Xlint:unchecked in a Maven project?
                            
                                How to convert XML to java.util.Map and vice versa?
                            
                                Why should java package name be lowercase?
                            
                                Lombok how to customise getter for Boolean object field?
                            
                                Optimization by Java Compiler
                            
                                Eclipse error ... cannot be resolved to a type
                            
                                FIFO class in Java
                            
                                Java 8 Boolean.logicalOr method
                            
                                Shrinking an ArrayList to a new size
                            
                                Difference in Auditing and Logging?
                            
                                Splitting List into sublists along elements
                            
                                How to parse month full form string using DateFormat in Java?
                            
                                Array vs ArrayList in performance [duplicate]
                            
                                How to check if a directory is empty in Java
                            
                                Difference between Arrays and 3 dots (Varargs) in java
                            
                                AmazonS3Client(credentials) is deprecated
                            
                                Spring Bean Scopes

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How can I get a Unicode character's code?

Tags:

java

unicode

character

Geo

People also ask

2 Answers

Jon Skeet

Andrzej Doyle

Recent Activity

Donate For Us