Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Is there way to check charset encoding of .txt file with Java?

Is there way to check is text file (.txt) encoded with Unicode or UTF-8 with Java?

like image 535
Zookey Avatar asked Jun 13 '12 20:06

Zookey


People also ask

How do I know the encoding of a text file?

Open the file with Notepad++ and will see on the right down corner the encoding table name. And in the menu encoding you can change the encoding table and save the file.

How can I tell if a text file is UTF-8?

To verify if a file passes an encoding such as ascii, iso-8859-1, utf-8 or whatever then a good solution is to use the 'iconv' command.

What encoding do TXT files use?

Encoding. The ASCII character set is the most common compatible subset of character sets for English-language text files, and is generally assumed to be the default file format in many situations.


2 Answers

You cannot know with absolute certainty which charset is used in the general case. I found this to be a good read.

http://illegalargumentexception.blogspot.co.uk/2009/05/java-rough-guide-to-character-encoding.html

Especially the section Automatic detection of encoding.

like image 105
Paul Grime Avatar answered Sep 25 '22 12:09

Paul Grime


Uhm, theoretically, how would you know if it is unicode?

This is the real question. Truthfully, you cannot know, but you can make a decent guess.

See: Java : How to determine the correct charset encoding of a stream for more details. :)

like image 41
Haakon Løtveit Avatar answered Sep 22 '22 12:09

Haakon Løtveit