I have this code stub:
System.out.println(param+"="+value);
param = URLEncoder.encode(param, "UTF-8");
value = URLEncoder.encode(value, "UTF-8");
System.out.println(param+"="+value);
This gives this result in Eclipse:
p=指甲油
p=%E6%8C%87%E7%94%B2%E6%B2%B9
But when I run the same code from command line, I get the following output:
p=指甲油
p=%C3%8A%C3%A5%C3%A1%C3%81%C3%AE%E2%89%A4%C3%8A%E2%89%A4%CF%80
What could be the problem?
Your Mac was using Mac OS Roman encoding in the terminal. Those Chinese characters are incorrectly been interpreted using Mac OS Roman encoding instead of UTF-8 encoding before sending to Java.
As evidence, those Chinese characters exist in UTF-8 encoding of the following (hex) bytes:
指 = 0xE6 0x8C 0x87甲 = 0xE7 0x94 0xB2油 = 0xE6 0xB2 0xB9Then check the Mac OS Roman codepage layout, those (hex) bytes represent the following characters:
Ê å áÁ î ≤Ê ≤ πNow, put them together and URL-encode them using UTF-8:
System.out.println(URLEncoder.encode("指甲油", "UTF-8"));
Look what it prints?
%C3%8A%C3%A5%C3%A1%C3%81%C3%AE%E2%89%A4%C3%8A%E2%89%A4%CF%80
To fix your problem, tell your Mac to use UTF-8 encoding in the terminal. Honestly, I can't answer that part off top of head as I don't do Mac. Your Eclipse encoding configuration is totally fine, but for the case that, you could configure it via Window > Preferences > General > Workspace > Text File Encoding.
Update: I missed a comment:
I am reading the value from a text file
If those variables are originating from a text file instead of from commandline input — as I initially expected —, then you need to solve the problem differently. Apparently, you was using a Reader implementation for that which is using the runtime environment's default character encoding like so:
Reader reader = new FileReader("/file.txt");
// ...
You should instead be explicitly specifying the desired encoding while creating the reader. You can do that with the InputStreamReader constructor.
Reader reader = new InputStreamReader(new FileInputStream("/file.txt"), "UTF-8");
// ...
This will explicitly tell Java to read /file.txt using UTF-8 instead of runtime environment's default encoding as available by Charset#defaultCharset().
System.out.println("This runtime environment uses as default charset " + Charset.defaultCharset());
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With