I've been looking through all StackOverflow in the bazillion of questions about capitalizing a word in Java, and none of them seem to care the least about internationalization and as a matter of fact none really seem to work in an international context. So here is my question.
I have a String in Java, which represents a word - all isLetter() characters, no whitespace. I want to make the first character upper case and the rest lower case. I do have the locale of my word in handy.
It's easy enough to call .substring(1).toLowerCase(Locale) for the last part of my string. I have no idea how to get the correct first character, though.
The first problem I have is with Dutch, where "ij" being a digraph should be capitalized together. I could special-case this by hand, because I know about it; now there may be other languages with this kind of thing that I don't know about, and I'm sure Unicode will tell me if I ask nicely. But I don't know how to ask.
Even if the above problem is solved, I'm still stuck with no proper way to handle English, Turkish and Greek, because Character supports titlecase but no locale, and String supports locales but not titlecase.
If I take the code point, and pass it to Character.toTitleCase(), this will fail because there is no way to pass the locale to this method. So if the system locale is in English but the word is Turkish, and the first char of the word is "i", I'll get "I" instead of "İ" and this is wrong. Now if I take a substring and use .toUpperCase(Locale), this will fail because it's upper and not title case. So if the word is Greek, I'll still get the wrong character.
If anyone has useful pointers, I'd be happy to hear them.
toTitleCase(char ch) converts the character argument to titlecase using case mapping information from the UnicodeData file. If a character has no explicit titlecase mapping and is not itself a titlecase char according to UnicodeData, then the uppercase mapping is returned as an equivalent titlecase mapping.
There are no capitalize() or titleCase() methods in Java's String class.
Unicode sequences can be used everywhere in Java code. As long as it contains Unicode characters, it can be used as an identifier. You may use Unicode to convey comments, ids, character content, and string literals, as well as other information.
Java uses UTF-16. A single Java char can only represent characters from the basic multilingual plane. Other characters have to be represented by a surrogate pair of two char s. This is reflected by API methods such as String.
Like you, I was unable to find a suitable method in the core Java API.
However, there does seem to be a locale-sensitive string-title-case method (UCharacter#toTitleCase
) in the ICU library.
Looking at the source for the relevant ICU methods (UCharacter#toTitleCase
and UCaseProps#toUpperOrTitle
), there don't seem to be many locale-specific special cases for title-casing, so you might be able to get away with the following:
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With