I'm wondering whether major SQL engines out there (MS SQL, Oracle, MySQL) have the ability to understand that 2 words are related because they share the same root.
We know it's easy to match "networking" when searching for "network" because the latter is a substring of the former.
But do SQL engines have functions that can match "network" when searching for "networking"?
Thanks a lot.
This functionality is called a stemmer: an algorithm that can deduce a stem from any form of the word.
This can be quite complex: for instance, Russian words шёл
and иду
are different forms of the same verb, though they have not a single common letter (ironically, this is also true for English: went
and go
).
Word breaking can also be quite a complex task for some languages that use no spaces between words.
SQL Server
allows using pluggable stemmers and word breakers for its fulltext search engine:
http://msdn.microsoft.com/en-us/library/ms142509.aspx
I think the topic is 'Semantic Similarity'. There are several efforts trying to find optimal solutions to this problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With