I wanted code to convert all the characters in strings to uppercase or lowercase in Java. I found a method that goes something like this: <pre class="prettyprint"><code>public static String changelowertoupper() { String str = "CyBeRdRaGoN"; str=str.toLowerCase(Locale.ENGLISH); return str; } </code></pre> Now I've read that using certain <code>Locale</code>s, like Turkish, "returns i (without dot) instead of i (with dot)." Is it safe to use <code>Locale</code>s like UK, US, ENGLISH, etc.? Are there any big differences between them when applied to strings? Which is the most preferred <code>Locale</code> for <code>String</code>s?

You can create appropriate locale for your <code>String</code>'s language. For example: <pre class="prettyprint"><code>toUpperCase(new Locale("tr","TR")); </code></pre> will do the trick for Turkish.

Using Locales with Java's toLowerCase() and toUpperCase()

Tags:

java

string

locale

I wanted code to convert all the characters in strings to uppercase or lowercase in Java.

I found a method that goes something like this:

public static String changelowertoupper() {          String str = "CyBeRdRaGoN";          str=str.toLowerCase(Locale.ENGLISH);          return str; }

Now I've read that using certain Locales, like Turkish, "returns i (without dot) instead of i (with dot)."

Is it safe to use Locales like UK, US, ENGLISH, etc.? Are there any big differences between them when applied to strings?

Which is the most preferred Locale for Strings?

548

asked Jun 16 '12 11:06

Arjun K P

2 Answers

I think you should use locale ,

For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", where 'ı' is the LATIN SMALL LETTER DOTLESS I character. To obtain correct results for locale insensitive strings, use toLowerCase(Locale.ENGLISH).

I refer to these links as solution to your problem and it has point to keep in mind in you situation "Turkish"

**FROM THE LINKS**

toLowerCase() respects internationalization (i18n). It performs the case conversion with respect to your Locale. When you call toLowerCase(), internally toLowerCase(Locale.getDefault()) is getting called. It is locale sensitive and you should not write a logic around it interpreting locale independently.

import java.util.Locale;   public class ToLocaleTest {     public static void main(String[] args) throws Exception {         Locale.setDefault(new Locale("lt")); //setting Lithuanian as locale         String str = "\u00cc";     System.out.println("Before case conversion is "+str+ " and length is "+str.length());// Ì         String lowerCaseStr = str.toLowerCase();     System.out.println("Lower case is "+lowerCaseStr+ " and length is "+lowerCaseStr.length());// iı`     } }

In the above program, look at the string length before and after conversion. It will be 1 and 3. Yes the length of the string before and after case conversion is different. Your logic will go for a toss when you depend on string length on this scenario. When your program gets executed in a different environment, it may fail. This will be a nice catch in code review.

To make it safer, you may use another method toLowerCase(Locale.English) and override the locale to English always. But then you are not internationalized.

So the crux is, toLowerCase() is locale specific.

reference 1
reference 2
reference 3

Dotless-i, is a lowercase 'i' without dot. The uppercase of this character is the usual "I". There is another character, "I with dot". The lowercase of this character is the usual lowercase "i".

Have you noticed the problem? This unsymetrical conversion causes a serious problem in programming. We face this problem mostly in Java applications because of (IMHO) poor implementation of toLowerCase and toUpperCase functions.

In Java, String.toLowerCase() method converts characters to lowercase according to the default locale. This causes problems if your application works in Turkish locale and especially if you are using this function for a file name or a url that must obey a certain character set.

I have blogged about two serious examples before: The compile errors with Script libraries with "i" in their names and XSP Manager's fault if an XPage is in a database with "I" in its name.

There is a long history, as I said. For instance in some R7 version, router was unable to send a message to a recipient if his/her name starts with "I". Message reporting agents was not running in Turkish locale until R8. Anyone with Turkish locale could not install Lotus Notes 8.5.1 (it's real!). The list goes on...

There is almost no beta tester from Turkey and customers don't open PMR for these problems. So these problems are not going up to the first priority for development teams.

Even Java team has added a special warning to the latest documentation:

This method is locale sensitive, and may produce unexpected results if used for strings that are intended to be interpreted locale independently. Examples are programming language identifiers, protocol keys, and HTML tags. For instance, "TITLE".toLowerCase() in a Turkish locale returns "tıtle", where 'ı' is the LATIN SMALL LETTER DOTLESS I character. To obtain correct results for locale insensitive strings, use toLowerCase(Locale.ENGLISH).

157

answered Oct 10 '22 11:10

shareef

You can create appropriate locale for your String's language.

For example:

toUpperCase(new Locale("tr","TR"));

will do the trick for Turkish.

answered Oct 10 '22 09:10

Caner

Related questions
                            
                                Base64 vs HEX for sending binary content over the internet in XML doc
                            
                                Android load from URL to Bitmap
                            
                                How can I calculate the difference between two ArrayLists?
                            
                                Difference in days between two dates in Java?
                            
                                How to specify jdk path in eclipse.ini on windows 8 when path contains space
                            
                                How can I append a query parameter to an existing URL?
                            
                                Detecting a long press with Android
                            
                                Android: Internet connectivity change listener
                            
                                Java decimal formatting using String.format?
                            
                                Intellij Spring Initializr not available
                            
                                Android AudioRecord forcing another stream to MIC audio source
                            
                                maintaining TreeSet sort as object changes value
                            
                                Markdown to HTML with Java/Scala
                            
                                Why does Java not allow foreach on iterators (only on iterables)? [duplicate]
                            
                                How to get backspace \b to work in Eclipse's console?
                            
                                Highlight exception throwers in IntelliJ IDEA
                            
                                Given that HashMaps in jdk1.6 and above cause problems with multi=threading, how should I fix my code
                            
                                Add an object to an ArrayList and modify it later
                            
                                Need sample Android REST Client project which implements Virgil Dobjanschi REST implementation pattern
                            
                                Is it possible to install both 32bit and 64bit Java on Windows 7?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With