Java programming language has extensive support for different charset and character encoding, by default it uses UTF-8.
Unfortunately, the file.encoding
property has to be specified as the JVM starts up; by the time your main method is entered, the character encoding used by String.getBytes()
and the default constructors of InputStreamReader
and OutputStreamWriter
has been permanently cached.
As Edward Grech points out, in a special case like this, the environment variable JAVA_TOOL_OPTIONS
can be used to specify this property, but it's normally done like this:
java -Dfile.encoding=UTF-8 … com.x.Main
Charset.defaultCharset()
will reflect changes to the file.encoding
property, but most of the code in the core Java libraries that need to determine the default character encoding do not use this mechanism.
When you are encoding or decoding, you can query the file.encoding
property or Charset.defaultCharset()
to find the current default encoding, and use the appropriate method or constructor overload to specify it.
From the JVM™ Tool Interface documentation…
Since the command-line cannot always be accessed or modified, for example in embedded VMs or simply VMs launched deep within scripts, a
JAVA_TOOL_OPTIONS
variable is provided so that agents may be launched in these cases.
By setting the (Windows) environment variable JAVA_TOOL_OPTIONS
to -Dfile.encoding=UTF8
, the (Java) System
property will be set automatically every time a JVM is started. You will know that the parameter has been picked up because the following message will be posted to System.err
:
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=UTF8
I have a hacky way that definitely works!!
System.setProperty("file.encoding","UTF-8");
Field charset = Charset.class.getDeclaredField("defaultCharset");
charset.setAccessible(true);
charset.set(null,null);
This way you are going to trick JVM which would think that charset is not set and make it to set it again to UTF-8, on runtime!
I think a better approach than setting the platform's default character set, especially as you seem to have restrictions on affecting the application deployment, let alone the platform, is to call the much safer String.getBytes("charsetName")
. That way your application is not dependent on things beyond its control.
I personally feel that String.getBytes()
should be deprecated, as it has caused serious problems in a number of cases I have seen, where the developer did not account for the default charset possibly changing.
I can't answer your original question but I would like to offer you some advice -- don't depend on the JVM's default encoding. It's always best to explicitly specify the desired encoding (i.e. "UTF-8") in your code. That way, you know it will work even across different systems and JVM configurations.
Try this :
new OutputStreamWriter( new FileOutputStream("Your_file_fullpath" ),Charset.forName("UTF8"))
I have tried a lot of things, but the sample code here works perfect. Link
The crux of the code is:
String s = "एक गाव में एक किसान";
String out = new String(s.getBytes("UTF-8"), "ISO-8859-1");
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With