Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Encoding cp1252

When I try the following in Java:

System.out.println(System.getProperty("file.encoding"));

I get cp1252 as the encoding.

Is there a way to know where this value is coming from? (Like Environment variables or something)

I would like to print the value of encoding on command prompt using some command like systeminfo on Windows XP.

like image 482
Arun Avatar asked Dec 01 '09 15:12

Arun


People also ask

Is CP-1252 a subset of UTF-8?

Windows-1252 is a subset of UTF-8 in terms of 'what characters are available', but not in terms of their byte-by-byte representation. Windows-1252 has characters between bytes 127 and 255 that UTF-8 has a different encoding for.

Is CP-1252 an ASCII?

CP-1252 is an 8-bit character encoding based on ASCII (identical up to code point 127). This is the default codepage for graphical applications under Windows.

Is Windows-1252 the same as ASCII?

Windows-1252 and ASCIIThe first part of Windows-1252 (entity numbers from 0-127) is the original ASCII character-set. It contains numbers, upper and lowercase English letters, and some special characters.

What is CP-1252 character encoding in Eclipse?

The default character encoding scheme in Eclipse is cp1252. You may be required to change this scheme, for example, if you intend to submit orders that contain character sets from languages such as Chinese, Japanese, or Norwegian. In this case, you can define the character encoding scheme as UTF-8.


3 Answers

cp1252 is the default encoding on English installations of MS Windows (what Microsoft refers to as ANSI). Java by default will take the system locale as its default character encoding. What this means is system dependent. In general I don't like to rely on default encodings. If I know my text will be pure ASCII I ignore it - otherwise I set the encoding explicitly when instantiating InputStreamReader, OutputStreamWriter, String etc or calling getBytes.

Note that cp1252 is not the default encoding on the Windows command prompt. That is the even older cp437, which you can see (and change) using the chcp command.

like image 66
Dan Avatar answered Oct 13 '22 03:10

Dan


That value is, on Windows at least, the legacy codepage used for non-Unicode text. It's what the OS converts strings to and from when you use the old ANSI APIs. For any newer program it should have no effect (that being said, I still see enough programs that use the A and not the W variants of API functions, sadly).

For you Java program none of that should matter, as Java uses Unicode exclusively. If you want to write or read text files in the system's codepage, then you'll need it, however.

For the command prompt, however, that encoding is of no significant value, as the console by default uses the OEM encoding which mimics the one of the DOS ages (850 or 437 is pretty common).

like image 41
Joey Avatar answered Oct 13 '22 05:10

Joey


Since this doesn't really have anything to do with Java, you could just opt to use a WSH script:

' save this script as printANSI.vbs
' usage: cscript /Nologo printANSI.vbs
Set objShell = CreateObject("WScript.Shell")
cp = objShell.RegRead("HKEY_LOCAL_MACHINE\SYSTEM\ControlSet001" &_
                              "\Control\Nls\CodePage\ACP")
WScript.Echo cp

See also the chcp command; you may want to read up on how encoding works on the Windows command prompt (some links in this blog post).

like image 22
McDowell Avatar answered Oct 13 '22 04:10

McDowell