Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error when searching the Content Manager with wildcards

I have noticed that if I search for certain phrases Tridion Content Manager gives me the following error

Unable to get the list of search results.
Unable to process the Search Request. Invalid search query: (*out*) AND RepositoryId:tcm\:0\-4\-1 AND OrganizationalItemAncestorIds:tcm\:*\-135625\-2. maxClauseCount is set to 10240
org.apache.lucene.search.BooleanQuery$TooManyClauses: maxClauseCount is set to 10240
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:136)
at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:127)
at org.apache.lucene.search.ScoringRewrite$1.addClause
[...and so on]

In the example above I am searching for the phrase *out*. It also fails when I search for the phrase *a* and various other smaller wildcard queries. out* works fine and *out* works fine if I limit the search to just the item titles. It doesn't matter whether I search withing "all publications" or a particular folder. It doesn't even matter if I limit the search results to the minimum (50).

Maybe this is something to do with the number of results returned?

The exact same search works fine on Tridion 5.3, I presume it isn't using lucene?

Any ideas on how to fix this?

like image 215
Kevin Brydon Avatar asked Nov 29 '12 14:11

Kevin Brydon


People also ask

How do you use wildcards when searching?

Wildcards take the place of one or more characters in a search term. A question mark (?) is used for single character searching. An asterisk (*) is used for multiple character searching.

What wildcard character is used to search for values that contain a search term in a list?

Wildcards are used in search terms to represent one or more other characters. The two most commonly used wildcards in our library databases are: An asterisk (*) may be used to specify any number of characters.

Can Wildcards be used with field searches?

The following rules apply to fields that contain strings, such as the Name field and the Type field: Searches are case sensitive. If the equal sign (=) is not used, wildcards are added to the beginning and end of the string so that any record that includes the string in the field is found.

What is the purpose of using wildcards when searching online?

Use the wildcard and truncation symbols to create searches where there are unknown characters, multiple spellings or various endings. Neither the wildcard nor the truncation symbol can be used as the first character in a search term. ​​The wildcard is represented by a question mark (?) or a pound/hash sign (#).


1 Answers

Leading wildcards are not allowed by Lucene (version R5.3 of Tridion used a Verity implementation that allowed them), due to the way it is indexed and searched. A leading wildcard effectively causes the index to scan every term for matches, rather than using more typical and performant methods using the index to find matches (see Lucene FAQ)

You can enable this by calling QueryParser.setAllowLeadingWildcard(true), but I strongly recommend against it in most cases.

A better approach might be to filter on terms that require a leading wildcard, rather than passing them into the query (not really feasible if the leading wildcard term is the only term being searched on)

Also, Lucene provides the ReverseStringFilter, a filter which indexes all terms in reverse as well. This would probably be the best way to create your index to enable leading wildcard searching.

Right off, I don't think either of these really handle a query like *out* though. Representing you data as N-Grams might be an option for that (see NGramTokenizer).

like image 132
femtoRgon Avatar answered Oct 13 '22 00:10

femtoRgon