Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Full-text Index stop list for Persian language

I need to customize Full-text search for Persian language. And customize Stop/Noise words and synonyms for this language.

My SQL Server version is 2016 and full-text search is installed.

like image 441
Ahmad Zareei Avatar asked Feb 19 '20 05:02

Ahmad Zareei


People also ask

What are full text Stopwords?

To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords. During index creation, the Full-Text Engine omits stopwords from the full-text index.

How do I create a full text index?

To create a full text index choose your table and right click on that table and select “Define Full-Text Index” option. Now select Unique Index. It is compulsory that for “Full Text Index” table must have at least one unique index. Select columns name and language types for columns.


2 Answers

For use Persian language in SQL Server Stop list, Full text catalog and etc just we should use Neutral. If you don't use Neutral in catalog I suggested for you change it to Neutral, sometimes this is empty like below:

enter image description here

Your problem solve by this query for any language:

--View Stoplist word
SELECT w.stoplist_id,
   l.name,
   w.stopword,
   w.language
FROM sys.fulltext_stopwords AS w
   INNER JOIN sys.fulltext_stoplists AS l
     ON w.stoplist_id = l.stoplist_id;

-- Stopwords list
CREATE FULLTEXT STOPLIST StopListCustome;
GO

-- Add a stopword
ALTER FULLTEXT STOPLIST StopListCustome
    ADD 'SQL' LANGUAGE 'English';
GO
ALTER FULLTEXT STOPLIST StopListCustome 
    ADD 'از' LANGUAGE 'Neutral';

Find this document and code on github

And also you can use below lists for add any stop list text in Persian and English :

Download English stoplist

Download Persian or Farsi stoplist for many words

Download Persian or Farsi stoplist for standard words

like image 165
Amirhossein Avatar answered Oct 07 '22 09:10

Amirhossein


Finally I found the solution.

By default when you create a full text index it is associated with a system stoplist. The default stoplist has more than 150 words for the English language.

configure-and-manage-stopwords-and-stoplists-for-full-text-search

full-text-search-stoplist-and-stopword

Just open this file and then add your words

[SQL Server Path]\MSSQL13.MSSQLSERVER\MSSQL\FTData\tsglobal.xml

<XML ID="Microsoft Search Thesaurus">
<thesaurus xmlns="x-schema:tsSchema.xml">
<diacritics_sensitive>0</diacritics_sensitive>
    <expansion>
        <sub>Internet Explorer</sub>
        <sub>IE</sub>
        <sub>IE5</sub>
    </expansion>
    <expansion>
        <sub>سازگار سیستم خاورمیانه</sub>
        <sub>ستیران</sub>
    </expansion>
        <expansion>
        <sub>آبجی</sub>
        <sub>خواهر</sub>
    </expansion>
    <replacement>
        <pat>NT5</pat>
        <pat>W2K</pat>
        <sub>Windows 2000</sub>
    </replacement>
    <expansion>
        <sub>run</sub>
        <sub>jog</sub>
    </expansion>
</thesaurus>

And then execute this SQL command.

EXEC sys.sp_fulltext_load_thesaurus_file 0;  

And for creating a custom stop list just flow these code:

CREATE FULLTEXT STOPLIST [PersianStopList]

And then add your stop list

    ALTER FULLTEXT STOPLIST [PersianStopList] ADD 'از' LANGUAGE 'Neutral';

Complete information in the Persian language: https://www.dotnettips.info/courses/topics/13#/page/1/date/desc

like image 44
Ahmad Zareei Avatar answered Oct 07 '22 09:10

Ahmad Zareei