Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full Text not indexing varbinary column (with html)

I have a table with HTML data, that I want to search using the Full Text Index via an html-filter

So I created an index:

CREATE FULLTEXT CATALOG myCatalog AS DEFAULT

CREATE FULLTEXT INDEX ON myTable (Body TYPE COLUMN Filetype)
                        KEY INDEX PK_myTable

Body is a varbinary(max) column with HTML. The Filetype column is a computed column returns .html.

No results are being returned.

I verified that .html filter is installed. FullText index is also installed properly and works fine if I convert the column to nvarchar and create just a "plain text" index (not html).

No errors in the SQL log or FTS log.

The keywords table is just empty!

SELECT *
FROM sys.dm_fts_index_keywords
(DB_ID('myDatabase'), OBJECT_ID('myTable'))

All it returns is "END OF FILE" symbol.

enter image description here

It says "document count 35" which mean the documents were processed, but no keywords were extracted.

PS. I have SQL Server Express Edition 2012 (with all advanced features including full text). Can this be the reason? But again, the "plain" full text search works just fine!

PPS. Asked my coworker to test this on SQL Express 2016 - same result... Tried on our production server "Enterprise" edition - same.

UPDATE

OK it turns out the full text index DOES NOT SUPPORT UNICODE!! in varbinary columns. When I converted the column to non-unicode (by converting it to nvarchar then to varchar and then back to varbinary) It started working.

Anyone knows any workarounds?

like image 809
Alex from Jitbit Avatar asked Mar 05 '23 17:03

Alex from Jitbit


1 Answers

OK, so it turns out fulltext index DOES support unicode data in varbinary but pay attention to this:

If your varbinary column is created from Nvarchar be sure to include the 0xFFFE unicode signature at the beginning

For example, I'm using a computed column for full text index, so I had to change my computed column to this:

alter table myTable
add FTS_Body as 0xFFFE + (CAST(HtmlBody as VARBINARY(MAX)))
--HtmlBody is my nvarchar column that contains html
like image 113
Alex from Jitbit Avatar answered Mar 15 '23 22:03

Alex from Jitbit