When I try the following in Java:
System.out.println(System.getProperty("file.encoding"));
I get cp1252
as the encoding.
Is there a way to know where this value is coming from? (Like Environment variables or something)
I would like to print the value of encoding on command prompt using some command like systeminfo on Windows XP.
Windows-1252 is a subset of UTF-8 in terms of 'what characters are available', but not in terms of their byte-by-byte representation. Windows-1252 has characters between bytes 127 and 255 that UTF-8 has a different encoding for.
CP-1252 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for graphical applications under Windows.
Windows-1252 and ASCIIThe first part of Windows-1252 (entity numbers from 0-127) is the original ASCII character-set. It contains numbers, upper and lowercase English letters, and some special characters.
The default character encoding scheme in Eclipse is cp1252. You may be required to change this scheme, for example, if you intend to submit orders that contain character sets from languages such as Chinese, Japanese, or Norwegian. In this case, you can define the character encoding scheme as UTF-8.
cp1252 is the default encoding on English installations of MS Windows (what Microsoft refers to as ANSI). Java by default will take the system locale as its default character encoding. What this means is system dependent. In general I don't like to rely on default encodings. If I know my text will be pure ASCII I ignore it - otherwise I set the encoding explicitly when instantiating InputStreamReader
, OutputStreamWriter
, String
etc or calling getBytes
.
Note that cp1252 is not the default encoding on the Windows command prompt. That is the even older cp437, which you can see (and change) using the chcp
command.
That value is, on Windows at least, the legacy codepage used for non-Unicode text. It's what the OS converts strings to and from when you use the old ANSI APIs. For any newer program it should have no effect (that being said, I still see enough programs that use the A and not the W variants of API functions, sadly).
For you Java program none of that should matter, as Java uses Unicode exclusively. If you want to write or read text files in the system's codepage, then you'll need it, however.
For the command prompt, however, that encoding is of no significant value, as the console by default uses the OEM encoding which mimics the one of the DOS ages (850 or 437 is pretty common).
Since this doesn't really have anything to do with Java, you could just opt to use a WSH script:
' save this script as printANSI.vbs
' usage: cscript /Nologo printANSI.vbs
Set objShell = CreateObject("WScript.Shell")
cp = objShell.RegRead("HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001" &_
"\Control\Nls\CodePage\ACP")
WScript.Echo cp
See also the chcp
command; you may want to read up on how encoding works on the Windows command prompt (some links in this blog post).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With