Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How would I search for blank facets in a multi valued facet field and at the same time in Solr?

I have an application where users can pick car parts. They pick their vehicle and then pick vehicle attributes as facets. After they select their vehicle, they can pick facets like engine size, for example, to narrow down the list of results. The problem was, not all documents have an engine size (it's an empty value in Solr), as it doesn't matter for all parts. For example, an engine size rarely matters for an air filter. So even if a user picked 3.5L for their engine size, I still wanted to show the air filters on the screen as a possible part the user could pick. I did some searching and the following facet query works perfectly:

enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *]) 

This query would match either 3.5 or would match records where there was no value for the engine size field (no value meant it didn't matter, and it fit the car). Perfect...

THE PROBLEM: I recently made the vehicle attribute fields multivalued fields, so I could store attributes for each part as a list. I then applied faceting to it, and it worked fine. However, the problem came up when I applied the query previously mentioned above. While selecting the enginesize facet narrowed down the number of documents displayed to only documents that have that engine size, records (I also use the word record to mean document) that had empty values (i.e. "") for enginesize were not appearing. The same query above does not work for multivalued facets the same way it did when enginesize was a single valued field.

Example:

 <doc> 
  <str name="part">engine mount</str>
  <arr name="enginesize">
   <str/>
   <str/>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
   <str>3.5</str>
  </arr>
 <doc>

<doc> 
  <str name="part">engine bolt</str>
  <arr name="enginesize">
   <str>6</str>
   <str>6</str>
   <str>6</str>
   <str>6</str>
   <str>6</str>
  </arr>
 <doc>

 <doc> 
  <str name="part">air filter</str>
  <arr name="enginesize">
   <str/>
   <str/>
   <str></str>
   <str></str>
   <str></str>
   <str></str>
   <str></str>
  </arr>
 <doc>

What I am looking for is a query that will pull back documents 1 and 3 above when I do a facet search for the engine size for 3.5. The first document (the engine mount) matches, because it contains the value in one of the multivalued fields "enginesize" that I am looking for (contains 3.5 in one of the fields). However, the third document for the air filter doesn't get returned because of the empty <str> values. I do not want to return the second document at all because it doesn't match the facet value

I basically want a query that will match empty string values for a given facet and also match the actual value, so I get both documents returned.

Does someone have a query that would return document 1 and document 3 (the engine bracket and the air filter), but not the engine bolt document?

I tried the following without success (including the one at the very top of this question):

// returns everything
enginesize:"3.5"    OR  (enginesize:[* TO *] )
// only returns document 1
enginesize:"3.5"    OR  (enginesize:["" TO ""] AND -enginesize:"3.5")
// only returns document 1
enginesize:"3.5" OR (enginesize:"")

I imported the data above using a CSV file, I set the field keepEmpty=true. I tried instead manually inserting a space into the field when I generated the CSV file (which would give you <str> </str>, instead of the previous , and then retried the queries. Doing that, I got the following results:

// returns document 1
enginesize:"3.5" OR enginesize:(*:* AND -enginesize:[* TO *])
// returns all documents
enginesize:"3.5"    OR  (enginesize:["" TO ""] AND -enginesize:"3.5")
// returns all documents
enginesize:"3.5" OR (enginesize:"")

Does anyone have a query that would work for either situation, whether I have a space as the blank value or simply no value at all?

like image 941
Dan Avatar asked Feb 12 '10 09:02

Dan


People also ask

What is SOLR faceted search?

What Is Faceted Search? Faceted search is the dynamic clustering of items or search results into categories that let users drill into search results (or even skip searching entirely) by any value in any field. Each facet displayed also shows the number of hits within the search that match that category.

What are facets in a faceted search?

Facets are a subset of filtering, and help visitors quickly refine their options without losing their way or ending up scrolling through page after page of irrelevant products when they are in search of something very specific.

How do Facets work in SOLR?

FacetingFacetingIn geometry, faceting (also spelled facetting) is the process of removing parts of a polygon, polyhedron or polytope, without creating any new vertices. New edges of a faceted polyhedron may be created along face diagonals or internal space diagonals.https://en.wikipedia.org › wiki › FacetingFaceting - Wikipedia is the arrangement of search results into categories based on indexed terms. Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found were each term.


1 Answers

How about changing how you index, instead of how you query?

Instead of trying to index "engine size doesn't matter" as an empty record, index it as "ANY".

Then your query simply becomes enginesize:"3.5" OR (enginesize:ANY)

like image 155
ThoughtfulHacking Avatar answered Sep 19 '22 16:09

ThoughtfulHacking