Read UTF-16 chars from a file and store them as UTF-8

Tags:

I have a Person pojo, with a name attribute which I store in my database within the respective persons table. My db server is MySQL with utf-8 set as the default server encoding, the persons table is an InnoDB table which was also created with utf-8 as the default encoding, and my db connection string specifies utf-8 as the connection encoding.

I am required to create and store new Person pojos, by reading their names from a txt file (persons.txt) which contains a name in every line, but the file encoding is UTF-16.

persons.txt

John

Μαρία

Hélène

etc..

Here is a sample code:

PersonDao dao = new PersonDao();
File file = new File("persons.txt");
BufferedReader reader = new BufferedReader(
                        new InputStreamReader(new FileInputStream(file), "UTF-16"));
String line = reader.readLine();
while (line!=null) {
    Person p = new Person();
    p.setName(line.trim());
    dao.save(p);
    line = reader.readLine();
}

To sum up, I am reading string characters as utf-16, store them in local variables and persist them as utf-8.

I would like to ask: Does any character conversion take place during this procedure? If yes, then at what point does this happen? Is it possible that I may end up storing broken characters due to the utf-16 -> utf-8 workflow?

790

asked Feb 24 '11 12:02

Argyro Kazaki

1 Answers

InputStreamReader converts characters from their external representation in the specified encoding (UTF-16 in your case) to the internal representation (i.e. char, String), that is always UTF-16 too, so effectively there is no conversion here in your case.

Internal representation of Strings should be converted to the database encoding by your JDBC driver, so you shouldn't care about it (though in the case of MySQL you should care about specifying the proper database encoding in the connection string).

If input encoding and (in the case of MySQL) database encoding are specified correctly, there are no chances of data loss during conversions, since both UTF-8 and UTF-16 are used to represent the same character set.

129

answered Nov 09 '22 22:11

axtavt

Related questions
                            
                                Maven archetype:generate excessive number of choice
                            
                                How to access authentication alias from EJB deployed to Websphere 6.1
                            
                                Replace ConcurrentHashMap with EnumMap
                            
                                XSL import causing FileNotFoundException in web application
                            
                                Memory-efficient Java library to read Excel files?
                            
                                joda-time 1.6.2 jar not downloading from maven central repository
                            
                                How many users have Flash, Java, Unity, or other plugins installed?
                            
                                How to model cycles between immutable class instances?
                            
                                What does a Terracotta server do when it is used as a backend for EHCache with Hibernate?
                            
                                Java problem time limit exceeded issue
                            
                                Java: generate CREATE TABLE code from an existing table
                            
                                JPA/Hibernate remove entity not working
                            
                                getting String value from a Spinner backed by CursorAdapter from SQL query in Android
                            
                                Something similar to checkstyle for C++ to be working on Linux
                            
                                Understanding a large Java program
                            
                                Android - download JSON file from url
                            
                                Is it a good practice to have a package/namespace and class within with the same name?
                            
                                Rendering lightweight markup languages with maven
                            
                                Image to Byte Array to String (and vice versa)
                            
                                How to find which thread currently owns the lock in java

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Read UTF-16 chars from a file and store them as UTF-8

Tags:

java

file

utf-8

Argyro Kazaki

People also ask

1 Answers

axtavt

Recent Activity

Donate For Us