How to convert custom encoded file to UTF-8 (in Java or with a dedicated tool)

Tags:

A legacy software I'm rewriting in Java uses custom (similar to Win-1252) encoding as it's data storage. For the new system I'm building I'd like to replace this with UTF-8.

So I need to convert those files to UTF-8 to feed my database. I know the character map used, but it's not any of the widely known ones. Eg. "A" is on position 0x0041 (as in Win-1252), but on 0x0042 there is a sign which in UTF-8 appears on position 0x0102, and so on. Is there an easy way to decode and convert those files with Java?

I've read many posts already but they all dealt with industry standard encodings of some kind, not with custom ones. I'm expecting it's possible to create a custom java.nio.ByteBuffer.CharsetDecoder or java.nio.charset.Charset to pass it to java.io.InputStreamReader as described in the first Answer here?

Any suggestions welcome.

720

asked Jan 20 '11 08:01

mmm

1 Answers

no need to be complicated. just make an array of 256 chars

static char[] map = { ... 'A', '\u0102', ... }

then

read each byte b in source
    int index = (0xff) & b; // to make it unsigned
    char c = map[index];
    target.write( c );

146

answered Sep 28 '22 06:09

irreputable

Related questions
                            
                                How to get QIODevice instance for stdin, stdout, stderr text streams in QtJambi?
                            
                                java swing library with layout defined in XML
                            
                                clojure/scala interop?
                            
                                How can I use Java in an Ant buildfile?
                            
                                Java Command Pattern vs iPhone Delegate Pattern
                            
                                Java, using one ActionListener for multiple radio buttons
                            
                                Replace an attribute in xml with xpath
                            
                                JSON-lib escaping / preserving strings
                            
                                What is the best solution for handling multiplatform (dev/integ/valid/prod...) development? Delivery process
                            
                                would Java's Runtime.getRuntime().exec() run on windows 7?
                            
                                What is import java.util.UUID used for?
                            
                                Apache CXF: adding custom documentation in the WSDL?
                            
                                JLabel paints new text over the old one, after set text is called
                            
                                Running jsp files with Apache tomcat on windows
                            
                                Is there any book studying Java open source code? [closed]
                            
                                maven / eclipse JPA project and Entity generation?
                            
                                How to create ZIP file for a list of "virtual files" and output to httpservletresponse
                            
                                How can I include a folder in the jar produced by Maven (not the content of the folder, the actual folder)
                            
                                nested enum-why is it used?
                            
                                Boost fresh documents with Lucene

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

How to convert custom encoded file to UTF-8 (in Java or with a dedicated tool)

Tags:

java

character-encoding

encoding

mmm

People also ask

1 Answers

irreputable

Recent Activity

Donate For Us