Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysql Select with LIKE clause is not working Chinese characters

Tags:

select

mysql

I have data stored in single column which are in English and Chinese.

the data is separated by the separators e.g. for Chinese

<!--:zh-->日本<!--:-->

for English

<!--:en-->English Characters<!--:-->

I would show the content according to users selected language.

I made a query like this

SELECT * FROM table WHERE content LIKE '<!--:zh-->%<!--:-->' 

The query above works but return empty result set.

Collation of content column is utf8_general_ci

I have also tried using the convert function like below

SELECT * FROM table WHERE CONVERT(content USING utf8) 
                           LIKE CONVERT('<!--:zh-->%<!--:-->' USING utf8)

But this also does not work.

I also tried running the query SET NAMES UTF8 but still it does not work.

I am running queries in PhpMyAdmin if it does matter.


qTranslate did not change the database used by WordPress. Translation data is stored in original fields. For that reason there is each field containing all translations for that special field and the data is like this

<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->

http://wpml.org/documentation/related-projects/qtranslate-importer/

like image 935
Daric Avatar asked Nov 21 '12 11:11

Daric


2 Answers

Test table data for content

<!--:zh-->日本<!--:--><!--:en-->English Characters<!--:-->
<!--:en-->English Characters<!--:--><!--:zh-->日本<!--:-->
<!--:zh-->日本<!--:-->
<!--:en-->English Characters<!--:-->

followed by

I have data stored in single column which are in English and Chinese

and your select should look like this

SELECT * FROM tab
WHERE content LIKE '%<!--:zh-->%<!--:-->%'

SQL Fiddle DEMO (also with demo how to get the special language part out of content)

SET @PRE = '<!--:zh-->', @SUF = '<!--:-->';

SELECT
  content,
  SUBSTR( 
    content, 
    LOCATE( @PRE, content ) + LENGTH( @PRE ), 
    LOCATE( @SUF, content, LOCATE( @PRE, content ) ) - LOCATE( @PRE, content ) - LENGTH( @PRE ) 
  ) langcontent
FROM tab
WHERE content LIKE CONCAT( '%', @PRE, '%', @SUF, '%' );

as stated in MySQL Documentation and follow the example of

SELECT 'David!' LIKE '%D%v%';
like image 121
Sir Rufo Avatar answered Nov 07 '22 09:11

Sir Rufo


As others have pointed, your queries seem to be fine, so I'd look somewhere else. This is something you can try:

I'm not sure about chinese input, but for japanese, many symbols have full-width and half-width variants, for example: "hello" and "hello" look similar, but the codepoints of their characaters are different, and therefore won't compare as equal. It's very easy to mistype something in full-width, and very difficult to detect, especially for whitespace. Compare " " and " ".

You are probably storing your data in half width and querying it in full width. Even if one character is different (especially spaces are difficult to detect), the query will not find your desired data.

There are many ways to detect this, for instance try copying the data and query into text files verbatim, and view them with hex editors. If there is a single bit difference in the relevant parts, you may be dealing with this problem.

like image 3
Panda Pajama Avatar answered Nov 07 '22 10:11

Panda Pajama