Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

’ character being converted to ’ in jdbc

I am trying to read a UTF-8 string from my MySql database, which I create using:

CREATE DATABASE april
  DEFAULT CHARACTER SET utf8
  DEFAULT COLLATE utf8_general_ci;

I make the table of interest using:

DROP TABLE IF EXISTS `article`;
CREATE TABLE `article` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `text` longtext NOT NULL,
  `date_created` timestamp DEFAULT NOW(),
  PRIMARY KEY (`id`)
) CHARACTER SET utf8;

If I select * from article in the MySql command line util, I get:

OIL sands output at Nexen’s Long Lake project dropped in February.

However, when I do

ResultSet rs = st.executeQuery(QUERY);

long id = -1;
String text = null;
Timestamp date = null;
while (rs.next()) {
    text = rs.getString("text");
    LOGGER.debug("text=" text);
}

the output I get is:

text=OIL sands output at Nexen’s Long Lake project dropped in February.

I get my Connection via:

DriverManager.getConnection("jdbc:" + this.dbms + "://" + this.serverHost + ":" + this.serverPort + "/" + this.dbName + "?useUnicode&user=" + this.username + "&password=" + this.password);

I've also tried, instead of the useUnicode parameter:

characterEncoding=UTF-8
and
characterEncoding=utf8

I also tried, instead of the line text = rs.getString("text")

rs.getBytes("text");
String[] encodings = new String[]{"US-ASCII", "ISO-8859-1", "UTF-8", "UTF-16BE", "UTF-16LE", "UTF-16", "Latin1"};
for (String encoding : encodings) {
    text = new String(temp, encoding);
    LOGGER.debug(encoding + ": " + text);
}
// Which outputted:
US-ASCII: OIL sands output at Nexen��������s Long Lake project dropped in February.
ISO-8859-1: OIL sands output at Nexenââ¬â¢s Long Lake project dropped in February.
UTF-8: OIL sands output at Nexen’s Long Lake project dropped in February.
UTF-16BE: 佉䰠獡湤猠潵瑰畴⁡琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴⁤牯灰敤⁩渠䙥扲畡特�
UTF-16LE: 䥏⁌慳摮⁳畯灴瑵愠⁴敎數썮겂蓢玢䰠湯⁧慌敫瀠潲敪瑣搠潲灰摥椠敆牢慵祲�
UTF-16: 佉䰠獡湤猠潵瑰畴⁡琠乥硥滃ꋢ芬ꉳ⁌潮朠䱡步⁰牯橥捴⁤牯灰敤⁩渠䙥扲畡特�
Latin1: OIL sands output at Nexenââ¬â¢s Long Lake project dropped in February.

I load the strings into the DB using some pre-defined sql in a file. This file is UTF-8 encoded.

mysql -u april -p -D april < insert_articles.sql

This file includes the line:

 INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");

When I print out that file within my application using:

BufferedReader reader = new BufferedReader(new FileReader(new File("/home/path/to/file/sql_article_inserts.sql")));
 String str;
 while((str = reader.readLine()) != null) {
     LOGGER.debug("LINE: " + str);
 }

I get the correct, expected output:

LINE: INSERT INTO article (text) value ("OIL sands output at Nexen’s Long Lake project dropped in February.");

Any help would be much appreciated.

Some System Details: I am running on linux (Ubuntu)

Edits:
* Edited to specify OS
* Edited to detail output of reading sql input file.
* Edited to specify more about how the data is inserted into the DB.
* Edited to to fix typo in code, and clarify example.

like image 822
barryred Avatar asked Dec 02 '25 16:12

barryred


1 Answers

Is it possible you're reading the log file using the incorrect encoding? windows-1252, I am guessing.

UTF-8: OIL sands output at Nexen’s Long Lake project dropped in February.

If this is appearing in the log, do a hex dump of the log file. If the data is UTF-8, you would expect the sequence Nexen’s to become 4E 65 78 65 6E E2 80 99 73. If some other application reads this as a native ANSI encoding, it'll decode it as Nexen’s.

To confirm, you can also dump the individual characters of the return value to see if they are correct in UTF-16:

//untested
for(char ch : text.toCharArray()) {
   System.out.printf("%04x%n", (int) ch);
}

I'm assuming all data is in the BMP, so you can just look up the results in the Unicode charts.

like image 89
McDowell Avatar answered Dec 04 '25 06:12

McDowell



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!