Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Selecting rows that contain Non-ASCII characters

Tags:

regex

php

mysql

Here is the issue: I have imported about 20000 game descriptions from mochimedia into my database, but there are many foreign games, which I do not want to list.

I came up with this query to find columns with non-ASCII characters

SELECT * FROM TABLE WHERE NOT HEX(COLUMN) REGEXP '^([0-7][0-9A-F])*$';

Note that I found this solution here on stackoverflow as I am not an expert if it comes to mysql queries.

However, while this query catches quiet a few foreign descriptions, it also seems to fail sometimes and finds perfectly fine descriptions, so what I am looking for is finetuning this query to skip the "okay" ones.

Here are a few returned rows that are "okay", meaning they should not be returned:

Game Boy Jam game that uses game boy restrictions. It’s a western platform game, where you play as a sheriff of the town. Your mission is to capture all the bad bandits in the land and bring them to justice.

and one more

It's hard to be a kitten if you have such a clumsy owner! Yesterday she lost a lot of things in the park and now it's up to you to find them!

Memories of that day can be helpful – you should remember where have you seen that thing last and search there.Map also can be usefull for your task. And finally you can climb up a tree and ask a big cat for a hint – you will see all the events of that day again.

But sometimes it's not enough to just find a lost thing. Some residents of the park may already be using it for themselves – be it mice or ants. In that case you may have to bring them something in exchange for a lost thing – only then you will get it back.

and one last example

Hungry honey bee is a unique fun game. It includes the fun of a platform game, puzzle game, adventure game, role playing game. In this fantasy game, one needs to make honey bee to collect all the flowers in order to win a match. As level progresses new challenges will be introduced with gradually toughness. Overall it’s a complete blend of fun which makes one stick with the game for hours. GOI: Rating 4.5 our of 5

Please remember that I am not a mysql expert, so I can only guess what the issue is, and my guess is that some of the characters like the

’ in It’s or the characters – and :

might cause this.

Maybe someone would be willing to share a optimized query to solve this problem? I spent quiet a few time with this but given the fact that I am still a newbie with php and absolutely not an expert with REGEXP and mysql queries, it would be nice to get some help here so I can improve my knowledge. Please do not assume that I will understand anything you say if you just throw it at me, so detailed help would be wonderful.

Thanks for your time reading this.

like image 661
Marcus Weller Avatar asked Feb 15 '23 11:02

Marcus Weller


1 Answers

If you're simply trying to find columns which contain non-ASCII characters, you can use the query below:

SELECT * 
FROM table 
WHERE column != CONVERT(column USING ASCII);
like image 89
ChoNuff Avatar answered Feb 27 '23 10:02

ChoNuff