Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL : strange LENGTH() behaviour on utf8 string

I am doing unit tests on requests generators, and I get in trouble with LENGTH function.

I have 2 requests that follows each other :

SHOW VARIABLES LIKE '%character%'

Returns the following result :

array(8) {
  [0] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_client"
    'Value' =>
    string(4) "utf8"
  }
  [1] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_connection"
    'Value' =>
    string(4) "utf8"
  }
  [2] =>
  array(2) {
    'Variable_name' =>
    string(22) "character_set_database"
    'Value' =>
    string(6) "latin1"
  }
  [3] =>
  array(2) {
    'Variable_name' =>
    string(24) "character_set_filesystem"
    'Value' =>
    string(6) "binary"
  }
  [4] =>
  array(2) {
    'Variable_name' =>
    string(21) "character_set_results"
    'Value' =>
    string(4) "utf8"
  }
  [5] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_server"
    'Value' =>
    string(4) "utf8"
  }
  [6] =>
  array(2) {
    'Variable_name' =>
    string(20) "character_set_system"
    'Value' =>
    string(4) "utf8"
  }
  [7] =>
  array(2) {
    'Variable_name' =>
    string(18) "character_sets_dir"
    'Value' =>
    string(26) "/usr/share/mysql/charsets/"
  }
}

My second request is :

SELECT LENGTH('重庆') as len

It returns 6 instead of 2.

What's wrong here ? My charset parameters looks good.

like image 768
Alain Tiemblo Avatar asked Apr 29 '13 12:04

Alain Tiemblo


People also ask

How do I count the length of a string in MySQL?

MySQL LENGTH() Function The LENGTH() function returns the length of a string (in bytes).

Does MySQL use utf8?

MySQL supports multiple Unicode character sets: utf8mb4 : A UTF-8 encoding of the Unicode character set using one to four bytes per character. utf8mb3 : A UTF-8 encoding of the Unicode character set using one to three bytes per character.

What is char length in MySQL?

The CHAR and VARCHAR types are declared with a length that indicates the maximum number of characters you want to store. For example, CHAR(30) can hold up to 30 characters. The length of a CHAR column is fixed to the length that you declare when you create the table. The length can be any value from 0 to 255.


2 Answers

I found my answer in the MySQL documentation :

The LENGTH function counts bytes :

mysql> SELECT LENGTH('重庆') ;
+------------------+
| LENGTH('重庆')   |
+------------------+
|                6 |
+------------------+
1 row in set (0.00 sec)

The CHAR_LENGTH function counts characters :

mysql> SELECT CHAR_LENGTH('重庆') ;
+-----------------------+
| CHAR_LENGTH('重庆')   |
+-----------------------+
|                     2 |
+-----------------------+
1 row in set (0.00 sec)
like image 193
Alain Tiemblo Avatar answered Oct 06 '22 01:10

Alain Tiemblo


They both work completely different:

Once LENGTH() returns always the length of the string by bytes. CHAR_LENGTH() is gonna return the length of the string by characters.

Once you are using Unicode, in which most characters are encoded in two bytes, It is always gonna be different. Or even when we are talking about UTF-8, where the number of bytes varies all the time.

e.g.:

SELECT LENGTH('重庆'), CHAR_LENGTH('重庆');
-->   6,  2  
like image 44
medina Avatar answered Oct 05 '22 23:10

medina