Is that possible to use full text index to find closest match strings? What does Statistical Semantics do in Full Text Indexing

Tags:

I am looking for SQL Server 2016 full text indexes and they are awesome to make searches for finding multiple words containing strings

When i try to compose the full text index, it shows Statistical Semantics as a tickbox. What does statistical semantics do?

Moreover, I want to find did you mean queries

For example lets say i have a record as house. The user types hause

Can i use full text index to return hause as closest match and show user did you mean house efficiently ? thank you

I have tried soundex but the results it generates are terrible

It returns so many unrelated words

And since there are so many records in my database and i need very fast results, i need something SQL server natively supports

Any ideas? Any way to achieve such thing with using indexes?

I know there are multiple algorithms but they are not efficient enough for me to use online. I mean like calculating edit distance between each records. They could be used for offline projects but i need this efficiency in an online dictionary where there will be thousands of requests constantly.

I already have a plan in my mind. Storing not-found results in the database and offline calculating closest matches. And using them as cache. However, i wonder any possible online/live solution may exists? Consider that there will be over 100m nvarchar records

436

asked Mar 22 '17 09:03

MonsterMMORPG

1 Answers

Short answer is no, Full Text Search cannot search for words that are similar, but different.

Full Text Search uses stemmers and thesaurus files:

The stemmer generates inflectional forms of a particular word based on the rules of that language (for example, "running", "ran", and "runner" are various forms of the word "run").

A Full-Text Search thesaurus defines a set of synonyms for a specific language.

Both stemmers and thesaurus are configurable and you can easily have FT match house for a search on hause, but only if you added hause as a synonym for house. This is obviously a non-solution as it requires you to add every possible typo as a synonym...

Semantic search is a different topic, it allows you to search for documents that are semantically close to a given example.

What you want is to find records that have a short Levenshtein distance from a given word (aka. 'fuzzy' search). I don't know of any technique for creating an index that can answer a Levenshtein search. If you're willing to scan the entire table for each term, T-SQL and CLR implementations of Levenshtein exists.

answered Oct 24 '22 14:10

Remus Rusanu

Related questions
                            
                                Can adding basic new SQL Server index create more problems?
                            
                                How to Create Procedures In Different Databases Using Cursor
                            
                                HTML form connecting to local H2 database
                            
                                Doctrine (DBAL) Error Handling while Executing Multiple Queries
                            
                                MySQL - Group By Number of Users within Ranges of Unique Login Days and by Department
                            
                                Unable to read system views in Entity Framework
                            
                                calculate total salary based on employee type
                            
                                MySQL 5.7.8 JSON merge new data
                            
                                join query optimization
                            
                                Query run from postman and phpmyadmin but not from android
                            
                                Getting all records from database between two dates C#
                            
                                How to use if condition with left join in mysql
                            
                                MS Access / SQL : error in insert query statement
                            
                                MySql search ranking with criteria
                            
                                Getting no such table error using pandas and sqldf
                            
                                Get results by count and operators in one query
                            
                                Workflow for adding new columns from Pandas to SQLite tables
                            
                                Syntax for identity-insert FROM with no columns to insert?
                            
                                Not getting ORA-25156 "old style outer join (+) cannot be used with ANSI join" when I should be
                            
                                Update database tables

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Is that possible to use full text index to find closest match strings? What does Statistical Semantics do in Full Text Indexing

Tags:

sql

sql-server

full-text-search

full-text-indexing

statistical-semantics

MonsterMMORPG

People also ask

1 Answers

Remus Rusanu

Recent Activity

Donate For Us