Java: Multibyte string length

Question

I have a method which prints "header text" for command line programs, much like the syntax of Markdown:

1. =======================
2. This is a header string
3. =======================

This method takes a char c for lines 1 and 3 and repeats it n times based on the length of s.

String.length() works fine with the English alphabet, but how can I find the length (the visual length, that is) of a string containing foreign multibyte characters like "Å" and "Ç"?

Ian Roberts · Accepted Answer

String.length will be fine for those sorts of characters, as Java strings work in UTF-16, which is sufficient to represent the vast majority of characters in common use (Latin, Greek, Arabic, Hebrew, Chinese, Thai, Devanagari, ...).

If you might need to deal with characters above U+FFFF then you need to use codePointCount instead of length to cope with surrogate pairs.

Johan Sjöberg · Answer

String.length() is fine for most Unicode characters including Å and Ç.

A Java string is utf-16 encoded where each Character takes up 2 or 4 bytes.

Supplementary characters denotes the characters taking 4 bytes and is implemented by pairing two characters, in which case the codePointCount operation must be used instead of length.

Characters though most certainly exist in the standard unicode specification.

Java: Multibyte string length

Tags:

java

josocblaugrana

2 Answers

Ian Roberts

Johan Sjöberg

Recent Activity

Donate For Us

Java: Multibyte string length

Tags:

java

josocblaugrana

2 Answers

Ian Roberts

Johan Sjöberg

Related questions

Recent Activity

Donate For Us