I need to search a text (around 500 words long) for words in an English dictionary (around 275,000 keywords) to detect non-English words and right now the query I am using is not really optimized which takes more than 10 seconds to execute (there's a words
table and a texts
table):
SELECT word FROM words WHERE 'The quick brown fox jumps over the lazy dog' LIKE CONCAT( '%', word, '%' );
Got the idea from here.
I have already set the word
field as an Index and seen some examples of people storing the text in the database or putting it directly in the query.
Other examples showed people using FULLTEXT search although having 300k words I don't think a FULLTEXT will work, I guess it's good to search with logic +brown +lazy -apple
but in my case I don't need much logic.
Another example I've seen is to concate words with the IN (...)
clause although having 500m keywords the query would just be insanely long.
Any ideas what to do?
Right now the text is saved as a text
field and the words as varchar(50)
in InnoDB with utf8_unicode_ci
encoding, I've heard InnoDB is slow so I could use MyISAM or any other. I am using MySQL 5.5 although I could update to 5.6 if that helped.
LIKE
comparisons are basically just wildcard-capable equality tests. They are not a generic keyword search engine.
WHERE foo LIKE '%a b%'
would find any records that contain the literal text a b
anywhere in the foo field, they don't look for a
or b
separately, a b
is a single monolithic "word" and that word is searched for in its entirety.
If you want to search for multiple "words" using LIKE
, you have to do
WHERE foo LIKE '%a%' OR foo LIKE '%b%' OR etc...
which quickly gets ugly, and extremely inefficient - %...
search cannot use indexes.
You'd be better off switching to a fulltext search system instead, where you can have the far simpler
WHERE MATCH(foo) AGAINST ('a b')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With