Auto-Detect Character Encoding in Java

Tags:

Seems to be a fairly hit issue, but I've not yet been able to find a solution; perhaps because it comes in so many flavors. Here it is though. I'm trying to read some comma delimited files (occasionally the delimiters can be a little bit more unique than commas, but commas will suffice for now).

The files are supposed to be standardized across the industry, but lately we've seen many different types of character set files coming in. I'd like to be able to set up a BufferedReader to compensate for this.

What is a pretty standard way of doing this and detecting whether it was successful or not?

My first thoughts on this approach are to loop through character sets simple->complex until I can read the file without an exception. Not exactly ideal though...

Thanks for your attention.

952

asked Feb 07 '12 18:02

Kirk

1 Answers

The Mozilla's universalchardet is supposed to be the efficient detector out there. juniversalchardet is the java port of it. There is one more port. Read this SO for more information Character Encoding Detection Algorithm

answered Oct 10 '22 21:10

Aravind Yarram

Related questions
                            
                                Order-independent Hash Algorithm
                            
                                Encrypt message for Web Push API in Java
                            
                                Does Java groupingBy collector preserve list order?
                            
                                Android Expandable RecyclerView different Card height
                            
                                Behavior of strictfp keyword with implementing/extending an interface/class
                            
                                How to correctly configure the property "sonar.java.binaries"?
                            
                                Mocking Bigquery for integration tests
                            
                                How do I get Spring Boot to automatically reconnect to PostgreSQL?
                            
                                "java.lang.NoSuchFieldError: super" exception - bug in compiler?
                            
                                How do I open packages and require dependencies on test scope modules only for JUnit testing
                            
                                How does URLConnection.setUseCaches() work in practice?
                            
                                Why doesn't my Service work in Android? (I just want to log something ever 5 seconds)
                            
                                JUnit terminates child threads
                            
                                Where should a Java web application store its data?
                            
                                Java multithreaded file downloading performance
                            
                                Why does JAXB sometimes map to JAXBElement?
                            
                                How do I dynamically resolve message parameters with Hibernate Validator?
                            
                                The composite pattern/entity system and traditional OOP
                            
                                Reading an inputStream all at once [duplicate]
                            
                                How do I play an audio file in Android?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Auto-Detect Character Encoding in Java

Tags:

java

io

encoding

bufferedreader

Kirk

People also ask

1 Answers

Aravind Yarram

Recent Activity

Donate For Us