I am looking for SQL Server 2016 full text indexes and they are awesome to make searches for finding multiple words containing strings
When i try to compose the full text index, it shows Statistical Semantics
as a tickbox. What does statistical semantics do?
Moreover, I want to find did you mean queries
For example lets say i have a record as house
. The user types hause
Can i use full text index to return hause
as closest match and show user did you mean house efficiently ? thank you
I have tried soundex but the results it generates are terrible
It returns so many unrelated words
And since there are so many records in my database and i need very fast results, i need something SQL server natively supports
Any ideas? Any way to achieve such thing with using indexes?
I know there are multiple algorithms but they are not efficient enough for me to use online. I mean like calculating edit distance between each records. They could be used for offline projects but i need this efficiency in an online dictionary where there will be thousands of requests constantly.
I already have a plan in my mind. Storing not-found results in the database and offline calculating closest matches. And using them as cache. However, i wonder any possible online/live solution may exists? Consider that there will be over 100m nvarchar records
What is a Full Text Index? A full-text index is a special type of index that provides index access for full-text queries against character or binary column data. A full-text index breaks the column into tokens and these tokens make up the index data.
Full-text search refers to searching some text inside extensive text data stored electronically and returning results that contain some or all of the words from the query. In contrast, traditional search would return exact matches.
Full-text search is the most common technique used in Web search engines and Web pages. Each page is searched and indexed, and if any matches are found, they are displayed via the indexes. Parts of original text are displayed against the user's query and then the full text.
Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.
Short answer is no, Full Text Search cannot search for words that are similar, but different.
Full Text Search uses stemmers and thesaurus files:
The stemmer generates inflectional forms of a particular word based on the rules of that language (for example, "running", "ran", and "runner" are various forms of the word "run").
A Full-Text Search thesaurus defines a set of synonyms for a specific language.
Both stemmers and thesaurus are configurable and you can easily have FT match house
for a search on hause
, but only if you added hause
as a synonym for house
. This is obviously a non-solution as it requires you to add every possible typo as a synonym...
Semantic search is a different topic, it allows you to search for documents that are semantically close to a given example.
What you want is to find records that have a short Levenshtein distance from a given word (aka. 'fuzzy' search). I don't know of any technique for creating an index that can answer a Levenshtein search. If you're willing to scan the entire table for each term, T-SQL and CLR implementations of Levenshtein exists.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With