Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ms sql - why does GO affect my select (Free Text Search)

I am experiencing a very weird behaviour from the management studio GO.
I am following the full-Text Search tutorial and I run the following query over adventure works

 SELECT FT_TBL.ProductDescriptionID,
 FT_TBL.Description, 
 KEY_TBL.RANK
 FROM Production.ProductDescription AS FT_TBL INNER JOIN
    CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
    ) AS KEY_TBL
    ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
 WHERE KEY_TBL.RANK > 2
 ORDER BY KEY_TBL.RANK DESC;

When I run it with Go (before the query ) and without Go and the results are different. I played with it some more - copied the select and added go between and the results are still different. Notice from the picture that it is the same select written twice but with two different results

enter image description here

Any idea?

EDIT: Just discovered the execution plan is different because of the go
------LINK----------

enter image description here

like image 757
Bick Avatar asked May 27 '13 14:05

Bick


People also ask

How do I know if full text search is enabled?

A: You can determine if Full-Text Search is installed by querying the FULLTEXTSERVICEPROPERTY like you can see in the following query. If the query returns 1 then Full-Text Search is enabled.

How does full text search work in SQL Server?

Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on the rules of a particular language such as English or Japanese. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.

What is free text search in SQL Server?

When FREETEXT is used, the full-text query engine internally performs the following actions on the freetext_string, assigns each term a weight, and then finds the matches: Separates the string into individual words based on word boundaries (word-breaking). Generates inflectional forms of the words (stemming).


2 Answers

I can't reproduce this on my machine but suspect this is by design with the error being filtering WHERE KEY_TBL.RANK > 2

The topic How Search Query Results Are Ranked (Full-Text Search) states

This rank value is calculated on every row and can be used as an ordering criteria to sort the result set of a given query by relevance. The rank values indicate only a relative order of relevance of the rows in the result set. The actual values are unimportant and typically differ each time the query is run. The rank value does not hold any significance across queries.

In the second query all values seem to be scaled down to FLOOR(20% * RANK) from the first query. As this is only a relative scale filtering on an absolute RANK value seems likely to cause this kind of issue where matching rows arbitrarily drop in and out of the results.

like image 32
Martin Smith Avatar answered Sep 23 '22 15:09

Martin Smith


I can reproduce this

Initial tests

Assuming a GO..SELECT..GO pattern and some blank lines

Before     After    Rows
y          n        6
y          y        4
n          y        4
n          n        4

If I change the order of events...

Before     After    Rows
y          n        6
y          y        4
n          n        4
n          y        4

And again

Before     After    Rows
y          y        4
n          n        4
n          y        4
y          n        6

Before     After    Rows
y          y        4
n          n        4
y          n        6
n          y        4

Update after @MartinSmith's answer

If I remove the WHERE, then I get the same results and the same rank value for four queries regardless of blank lines. With the WHERE I get the same results as above.

Martin showed that rank is arbitrary, but his same link also states

This explains why the same query can return different rank results over time as full-text indexed data is added, modified, and deleted, and as the smaller indexes are merged.

In this case, data is static and the query is identical except for blank lines.

Changing the query text

  • With WHERE KEY_TBL.RANK > 0 = identical row count and rank values
  • No WHERE clause = identical row count and rank values
  • With the CONTAINSTABLE wrapped in TOP 200000..ORDER BY RANK = identical row count and rank values, regardless of WHERE

With original WHERE KEY_TBL.RANK > 2

  • Change the case of the first FT_TBL alias to be different each time (fT_TBL, Ft_TBL, FT_tBL, FT_TbL) which means a different query plan = identical row count and rank values
  • Changing all FT_TBL aliases to FT_TBl = identical row count and rank values
  • Changing all FT_TBl aliases back to FT_TBL = different rank values based on blank lines -> different row counts

For the last query (different rank values) I get 4 identical execution plans. The filter is applied to the TVF before the JOIN.

  • Adding a blank line after the ON before the WHERE (keeping leading and trailing blank lines) = identical row count and rank values
  • Adding a blank line before FROM = identical row count and rank values
  • Adding 3 spaces after JOIN = identical row count and rank values

Adding DBCC DROPCLEANBUFFERS

  • Adding DBCC DROPCLEANBUFFERS at the top 4-4-6-4 rows
  • Same if I add DBCC DROPCLEANBUFFERS before each query
  • Adding an extra blank lines (with the DBCC) before or after the first query gives more rows (6-4-6-4). Found this by mistake, but leads back to the first observation that blank lines affect results

After Paul's comment

Note the trailing and leading blank lines.

This fails reliably. The second query gives 4 rows

GO

SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO
SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO

SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO
SELECT FT_TBL.ProductDescriptionID,
   FT_TBL.Description, 
   KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
   CONTAINSTABLE (Production.ProductDescription,
      Description, 
      '(light NEAR aluminum) OR
      (lightweight NEAR aluminum)'
   ) AS KEY_TBL
   ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO

This gives the same every time after I shift+TAB the entire query to remove indents

GO

SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO
SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO

SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;
GO
SELECT FT_TBL.ProductDescriptionID,
FT_TBL.Description, 
KEY_TBL.RANK
FROM Production.ProductDescription AS FT_TBL INNER JOIN
CONTAINSTABLE (Production.ProductDescription,
Description, 
'(light NEAR aluminum) OR
(lightweight NEAR aluminum)'
) AS KEY_TBL
ON FT_TBL.ProductDescriptionID = KEY_TBL.[KEY]
WHERE KEY_TBL.RANK > 2
ORDER BY KEY_TBL.RANK DESC;

GO

Still looking

Ok, time to search MS Connect. New one: https://connect.microsoft.com/SQLServer/feedback/details/788691/blank-lines-before-and-after-full-text-query-change-results

like image 198
gbn Avatar answered Sep 24 '22 15:09

gbn