Reading special characters from File - Java

Question

I am reading data from a text file with following properties:

Encoding: ANSI
File Type: PC

Now, the file contains lot of special characters like degree symbol(º) etc. I am reading this file using the following code:

File file = new File("C:\X\Y\SpecialCharacter.txt");
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"));

If the file encoding is ANSI, the above code does not read the special characters properly e.x. the line in file:
"Lower heat and simmer until product reaches internal temperature of 165ºF" , reader.readLine() would output:
"Lower heat and simmer until product reaches internal temperature of 165�F"

When I changed the encoding for the file to UTF-8, the line reads as it is in the file without messing up the special characters.

My question, at what point does the data get messed up? When storing the data in the file or when reading it from the file? Opening the file in Notepad displays all the special characters properly. How does that happen ?

Hexdump output:

          -0 -1 -2 -3  -4 -5 -6 -7  -8 -9 -A -B  -C -D -E -F

00000000- 4C 6F 77 65  72 20 68 65  61 74 20 61  6E 64 20 73 [Lower heat and s]
00000001- 69 6D 6D 65  72 20 75 6E  74 69 6C 20  70 72 6F 64 [immer until prod]
00000002- 75 63 74 20  72 65 61 63  68 65 73 20  69 6E 74 65 [uct reaches inte]
00000003- 72 6E 61 6C  20 74 65 6D  70 65 72 61  74 75 72 65 [rnal temperature]
00000004- 20 6F 66 20  31 36 35 BA  46                       [ of 165.F       ]

Jon Skeet · Accepted Answer

"ANSI" is not a particular encoding - it's a whole collection of encodings. You need to use the right encoding when reading the file. For example, it's entirely possible that you're using the Windows-1252 encoding, which means you may want to try passing in "Cp1252" as the encoding name.

In fact, you're passing in "UTF-8" which isn't one of the encodings typically referred to as ANSI. You need to find out the exact encoding that the file uses, and then specify that in the InputStreamReader parameter.

My question, at what point does the data get messed up? When storing the data in the file or when reading it from the file?

Assuming the encoding is capable of representing all the characters you're interested in, it's only when you read the file. Basically, you're trying to read it as if it's in one encoding, when it's actually in another. Notepad is either performing some sort of heuristic encoding detection, or it happens to use the right default for this particular situation.

Reading special characters from File - Java

Tags:

java

file-io

special-characters

name_masked

1 Answers

Jon Skeet

Recent Activity

Donate For Us

Reading special characters from File - Java

Tags:

java

file-io

special-characters

name_masked

1 Answers

Jon Skeet

Related questions

Recent Activity

Donate For Us