Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Retrieving SQL Server Full Text Index terms

Using Lucene, one can retrieve the terms contained within in an index, i.e. the unique, stemmed words, excluding stop-words, that documents in the index contain. This is useful for generating autocomplete suggestions amongst other things. Is something similar possible with MS SQL Server full text indices?

like image 672
friism Avatar asked Nov 13 '08 14:11

friism


People also ask

How do I get full-text indexes in SQL Server?

Select columns name and language types for columns. You can only select character based and image based columns. Select change tracking. Now select the full-text catalog for index.

How do I find the full-text catalog in SQL Server?

To get to this screen, go to the database and then go to Storage > Full Text Catalogs, find the full text catalog you want to work with and right click on it and select Properties.

Is full-text indexed?

A full-text index is a special type of index that provides index access for full-text queries against character or binary column data. A full-text index breaks the column into tokens and these tokens make up the index data.


3 Answers

You can use the new system view in SQL Server 2008 to get you the terms and count of occurrences, is this what you want?

sys.dm_fts_index_keywords_by_document
( 
    DB_ID('database_name'),     
    OBJECT_ID('table_name') 
)

You need to supply the db_id and object_id of the fulltext table. This is the MSDN link for this: sys.dm_fts_index_keywords_by_document.

like image 54
Coolcoder Avatar answered Sep 19 '22 03:09

Coolcoder


I agree that this information (words in the index, stemmed words, etc.) is usefull - and if SQL Server is serious about offering a serch platform, this information needs to be exposed. It's really not available in previous versions, as far as I can tell. However, the game changes in SQL Server 2008.

SQL Server 2008 offers new dynamic management views that offer this metadata for full text. Pay particular note to sys.dm_fts_parser and sys.dm_fts_index_keywords.

The sys.dm_fts_parser view takes in a phrase, along with a couple of other parameters and outputs a table showing a row set, showing stemmed versions of the individual words after the word breaker has deemed them as separate words.

MSDN gives the example of this query against the view:

SELECT * FROM sys.dm_fts_parser (' "The Microsoft business analysis" ', 1033, 0, 0)

To get the keywords, you can use sys.dm_fts_index_keywords.

I hope that points you in the right direction. Cheers.

like image 35
Newfave Avatar answered Sep 20 '22 03:09

Newfave


With the following you can take the table name and column name as variables. It will work on the current database that it is run, omit end of file results and order the results descending by number of occurrences.

Note that full text indexing must be enabled on that table and column

DECLARE @TableName NVARCHAR(200) = 'MyTable'
DECLARE @ColumnName NVARCHAR(200) = 'MyColumn'

SELECT DISTINCT(display_term), occurrence_count FROM sys.dm_fts_index_keywords_by_document
( 
    DB_ID(),     
    OBJECT_ID(@TableName) 
)
WHERE column_id =
(
    SELECT TOP 1 col.column_id FROM sys.objects obj
    INNER JOIN sys.columns col ON col.object_id = obj.object_id
    WHERE obj.name = @TableName
    AND col.name = @ColumnName
)
AND keyword != 0xFF
ORDER BY occurrence_count DESC
like image 37
tic Avatar answered Sep 20 '22 03:09

tic