Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

MySQL 5.5 losing trailing spaces in query

Tags:

mysql

I am building a database in which trailing space is important to the results. When I query for a result I find that

SELECT * where `field` = 'a ' 

Returns a result when there is a field who's value is 'a'. I want the trailing space to matter in the result set. I have tried using Char, Varchar, Text, and Blob. I will note that this field is the index of my table.

Can someone show me how to query in a way that makes trailing (and/or leading) spaces count? Do I need to format my table in any special way to make this work?

like image 250
taggedzi Avatar asked Sep 22 '12 10:09

taggedzi


2 Answers

This behaviour is by design, not only in MySQL.

You can work around it in comparisons by using BINARY:

mysql> select version(), 'a' = 'a ', BINARY 'a' = BINARY 'a ';
+-------------+------------+--------------------------+
| version()   | 'a' = 'a ' | BINARY 'a' = BINARY 'a ' |
+-------------+------------+--------------------------+
| 5.5.25a-log |          1 |                        0 |
+-------------+------------+--------------------------+
1 row in set (0.00 sec)

but not much more. This will help you with SELECTs if whitespaces appear e.g. in user input to a search; but if you want to actually input whitespace-trailed information, it will be a problem (you can't have an index with both 'a' and 'a ').

See also

Trailing whitespace in varchar needs to be considered in comparison

You could conceivably reverse the strings in that column, and reverse them back when displaying them. Of course this will wreck any ordering based on that column, but if you only test equality or substring existence, it just might work. Leading spaces do count.

For equality searches you might also store the base64 encoding of the string, which ought to maintain the lexicographical order (i.e., the order between a and b ought to be maintained between base64(a) and base64(b)). Or you might append a terminator on the string ("\n" could do well and not appear in searches).

Finally, but it's risky because humans can't tell the difference, you could replace spaces with the UTF8 char(49824):

mysql> select concat ('\'a', char(49824),'\'') AS tricked,
              concat ('\'a', ' '        ,'\'') as honest,
              concat ('\'a', char(49824),'\'') =
              concat ('\'a', ' '        ,'\'') as equals;

+---------+--------+--------+
| tricked | honest | equals |
+---------+--------+--------+
| 'a '    | 'a '   |      0 |
+---------+--------+--------+
1 row in set (0.00 sec)

The rows seem to be equal, but they are not. Note that in HTML the space is a space, and 49824 is   (nonbreaking space). This affects functions that convert to and fro HTML, and the nbsp being actually an UTF8 codepoint means that honest string is two bytes, but length of tricked string is actually three.

Finally you can declare the column VARBINARY instead of VARCHAR, thus completely hiding what's happening. It looks like the easiest solution, but I fear it might bite you some weeks or months down the line.

like image 119
LSerni Avatar answered Oct 19 '22 21:10

LSerni


I have had success doing the following, but I'm not sure if it is an unstable approach.

CREATE TEMPORARY TABLE test (
  PRIMARY KEY(id),
  id INT AUTO_INCREMENT,
  val VARCHAR(20)
);

INSERT INTO test VALUES
(NULL, 'a'),
(NULL, 'a '),
(NULL, 'a  '),
(NULL, 'a   ');

SELECT * FROM test
WHERE val LIKE 'a ';

Output

id  val
2   'a '

Using WHERE val = 'a ' will select all entries without taking trailing spaces into consideration, but LIKE works for me.

like image 24
Mark Avatar answered Oct 19 '22 21:10

Mark