Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Where can I find a list of 'Stop' words for Oracle fulltext search?

I've a client testing the full text (example below) search on a new Oracle UCM site. The random text string they chose to test was 'test only'. Which failed; from my testing it seems 'only' is a reserved word, as it is never returned from a full text search (it is returned from metadata searches).

I've spent the morning searching oracle.com and found this which seems pretty comprehensive, yet does not have 'only'.

So my question is thus, is 'only' a reserved word. Where can I find a complete list of reserved words for Oracle full text search (10g)?

Full text search string example;

(<ftx>test only</ftx>)


Update. I have done some more testing. Seems it ignores words that indicate places or times; only, some, until, when, while, where, there, here, near, that, who, about, this, them.

Can anyone confirm this? I can't find this in on Oracle anywhere.


Update 2. Post Answer I should have been looking for 'stop' words not 'reserved'. Updated the question title and tags to reflect.
like image 301
Tyronomo Avatar asked Feb 27 '23 21:02

Tyronomo


2 Answers

Additional answers:

  • See default Oracle (11g) stopword lists here: http://download.oracle.com/docs/cd/B28359_01/text.111/b28304/astopsup.htm#i634475

  • The following query allows to list stopwords from all stoplists (to be run on CTXSYS schema):

SELECT *
FROM DR$STOPWORD
LEFT JOIN DR$STOPLIST ON DR$STOPWORD.SPW_SPL_ID = DR$STOPLIST.SPL_ID

In the results, the SPL_* fields come from the DR$STOPLIST system table, and the SPW_* fields from the DR$STOPWORD table

  • From a user schema, user defined stoplists and stopwords can be retrieved through
SELECT * FROM CTX_USER_STOPLISTS;
SELECT * FROM CTX_USER_STOPWORDS;
like image 63
Frosty Z Avatar answered Mar 05 '23 16:03

Frosty Z


I bet the system is trying to automatically ignore frequently occurring words. That would explain why you cannot find 'only' but 'onnly' can be found. Can you search for 'a', 'an', ...

The list you gave of words that do not work looks like some very common words that frequently are not the primary words in a sentence. Given this, they are not likely to be words you are searching for on a full text search.

What are the odds that you are looking for an article that includes the word 'that' and the inclusion of that word is the only fact you have on the article?

I think I found your list.... Ironically from the wiki page of the last company I started..: http://www.sugarcrm.com/wiki/index.php?title=Overview_of_Full_Text_Stop_Words#Default_Stop_Words_.28for_English.29

2.10.3 Modifying the Default Stoplist The default stoplist is always named CTXSYS.DEFAULT_STOPLIST. You can use the following procedures to modify this stoplist:
 • CTX_DDL.ADD_STOPWORD
 • CTX_DDL.REMOVE_STOPWORD
 • CTX_DDL.ADD_STOPTHEME
 • CTX_DDL.ADD_STOPCLASS
 When you modify CTXSYS.DEFAULT_STOPLIST with the CTX_DDL package, you must re-create your index for the changes to take effect.

Default stopword list:

a he out up
be more their at
had one  will  from
it than and is
only when corp not
she also in  says
was by ms to
about her  over  
because  most  there  
has or  with  
its that are  
of which could  
some an inc  
we can mz  
after  his s  
been mr they  
have other  would  
last the as  
on who for  
such any into  
were co  no  
all if so  
but mrs this

Update - A nice whitepaper from Oracle that includes how full text searching works can be downloaded from: http://www.oracle.com/technology/products/text/pdf/text_techwp.pdf. They mention the stopwords and the fact that there is a default list, but don't mention the words themselves.

like image 39
TheJacobTaylor Avatar answered Mar 05 '23 14:03

TheJacobTaylor