Unicode-correct title case in Java

Tags:

I've been looking through all StackOverflow in the bazillion of questions about capitalizing a word in Java, and none of them seem to care the least about internationalization and as a matter of fact none really seem to work in an international context. So here is my question.

I have a String in Java, which represents a word - all isLetter() characters, no whitespace. I want to make the first character upper case and the rest lower case. I do have the locale of my word in handy.

It's easy enough to call .substring(1).toLowerCase(Locale) for the last part of my string. I have no idea how to get the correct first character, though.

The first problem I have is with Dutch, where "ij" being a digraph should be capitalized together. I could special-case this by hand, because I know about it; now there may be other languages with this kind of thing that I don't know about, and I'm sure Unicode will tell me if I ask nicely. But I don't know how to ask.

Even if the above problem is solved, I'm still stuck with no proper way to handle English, Turkish and Greek, because Character supports titlecase but no locale, and String supports locales but not titlecase.

If I take the code point, and pass it to Character.toTitleCase(), this will fail because there is no way to pass the locale to this method. So if the system locale is in English but the word is Turkish, and the first char of the word is "i", I'll get "I" instead of "İ" and this is wrong. Now if I take a substring and use .toUpperCase(Locale), this will fail because it's upper and not title case. So if the word is Greek, I'll still get the wrong character.

If anyone has useful pointers, I'd be happy to hear them.

638

asked Sep 09 '11 11:09

Jean

1 Answers

Like you, I was unable to find a suitable method in the core Java API.

However, there does seem to be a locale-sensitive string-title-case method (UCharacter#toTitleCase) in the ICU library.

Looking at the source for the relevant ICU methods (UCharacter#toTitleCase and UCaseProps#toUpperOrTitle), there don't seem to be many locale-specific special cases for title-casing, so you might be able to get away with the following:

Find the first cased character in the string.
If it has a title-case form distinct from its upper-case form, use that.
Otherwise, perform a locale-sensitive upper-case on that first character and its combining characters.
Perform a locale-sensitive lower-case on the rest of the string.
If the locale is Dutch and the first cased character is an "I" followed by a "j", upper-case the "j".

178

answered Sep 21 '22 06:09

Stuart Cook

Related questions
                            
                                How can a consistent Java code format be enforced?
                            
                                Can't read AppletViewer properties file - Applet
                            
                                What can cause IllegalMonitorStateException from inside a synchronized block?
                            
                                Generic type <P> converted to paragraph tag in Javadoc
                            
                                How can I get Eclipse 2018.09 to use the JUnit 4 test runner by default?
                            
                                Why do Java sources have so many folders inside each other?
                            
                                Why does Java enforce return type compatibility for overridden static methods?
                            
                                OS X 10.8 Gatekeeper and Java applets
                            
                                Writing custom Lombok Annotation handlers
                            
                                How can I make external methods interruptable?
                            
                                how to make translation animation for each listview items
                            
                                How to add global exception interceptor in gRPC server?
                            
                                Select multiple images from Photo Gallery on Android using Intents
                            
                                Non-blocking (async) DNS resolving in Java
                            
                                Java generic methods in generics classes
                            
                                Hash Array Mapped Trie (HAMT)
                            
                                Generating Swagger UI documentation for REST API
                            
                                Java Stream: find an element with a min/max value of an attribute
                            
                                How to find out which thread holds the monitor?
                            
                                Properly implementing Java modules in a Maven build with inter-module test dependencies

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Unicode-correct title case in Java

Tags:

java

string

unicode

Jean

People also ask

1 Answers

Stuart Cook

Recent Activity

Donate For Us