I'm working on a knowledge base project using SQL Server 2008 Full Text Search Engine. Project included in articles and files where each article has multiple files.In those articles whole content is pure html.
Right now,I successfully created fulltext catalog and index on SQL Server 2008 and my database is version 10 compatible.
Here are my questions:
1)Is it possible to ignore html tags,more clearly texts containing in "<...>", while searching in these articles,because if i wish to search for div,table etc. there should be no result returned?
2)Articles will be updated anytime,so full text index must be updated when a new record is inserted.Is it enough to set only "TRACK CHANGES AUTOMATIC" while creating full text catalog?
3)We may use FILESTREAM feature hereafter,does SQL Server 2008 have a good performance on files using full text index? What specific document types does SQL Server 2008 good on indexing?
Regards
Full-Text Search in SQL Server and Azure SQL Database lets users and applications run full-text queries against character-based data in SQL Server tables.
When FREETEXT is used, the full-text query engine internally performs the following actions on the freetext_string, assigns each term a weight, and then finds the matches: Separates the string into individual words based on word boundaries (word-breaking). Generates inflectional forms of the words (stemming).
there is a filter for .htm and .html files.
to see if you have the filter installed run this sql:
SELECT * FROM sys.fulltext_document_types
you should see:
.htm E0CA5340-4534-11CF-B952-00AA0051FE20 C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\nlhtml.dll 12.0.6828.0 Microsoft Corporation
.html E0CA5340-4534-11CF-B952-00AA0051FE20 C:\Program Files\Microsoft SQL Server\MSSQL10.MSSQLSERVER\MSSQL\Binn\nlhtml.dll 12.0.6828.0 Microsoft Corporation
so, if you can convert your articles column to varbinary(max), then you can add a full text index on it and specify a doc type of '.html'
once the index has populated, you can verify the keywords using this sql:
SELECT display_term, column_id, document_count
FROM sys.dm_fts_index_keywords
(DB_ID('your_db'), OBJECT_ID('your_table'))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With