I have a method which prints "header text" for command line programs, much like the syntax of Markdown:
1. =======================
2. This is a header string
3. =======================
This method takes a char c
for lines 1 and 3 and repeats it n
times based on the length of s
.
String.length()
works fine with the English alphabet, but how can I find the length (the visual length, that is) of a string containing foreign multibyte characters like "Å" and "Ç"?
String.length
will be fine for those sorts of characters, as Java strings work in UTF-16, which is sufficient to represent the vast majority of characters in common use (Latin, Greek, Arabic, Hebrew, Chinese, Thai, Devanagari, ...).
If you might need to deal with characters above U+FFFF then you need to use codePointCount
instead of length
to cope with surrogate pairs.
String.length()
is fine for most Unicode characters including Å
and Ç
.
A Java string is utf-16
encoded where each Character
takes up 2
or 4
bytes.
Supplementary characters denotes the characters taking 4
bytes and is implemented by pairing two characters, in which case the codePointCount
operation must be used instead of length
.
Characters though most certainly exist in the standard unicode specification.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With