Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

elasticsearch match all words from document in the search query

We can search for ALL words in a specific document.field like this:

{ "query" : { "match" : { "title": { "query" : "Black Nike Mens", "operator" : "and" } } } }

This will search for the words Black, Nike and Mens in the field title such that only those documents are returned that will have ALL these words in the title field.

But what I am trying to do is a little different.

I want to lookup such that if all the words of the title field of the document are present in my search query then it will return that document.

For e.g.

suppose there is a document with title : "Nike Free Sparq Mens White" in the elasticsearch database

now if I search with a query : "Nike Free Sparq 09 - Mens - White/Black/Varsity Red" then it should return this document, because all the words in the document.title do exist in my query

but if I search with a query : "Nike Free Lebron - Mens - White/Black" then it should NOT return the document because my query has the word Sparq missing

this is a sort of reverse-and-operator search

Is this possible? If yes, then how?

like image 664
Imran Ahmed Avatar asked Sep 15 '15 07:09

Imran Ahmed


People also ask

What is Elasticsearch full text search?

Overview. Full-text search queries and performs linguistic searches against documents. It includes single or multiple words or phrases and returns documents that match search condition. ElasticSearch is a search engine based on Apache Lucene, a free and open-source information retrieval software library.

How does match query work in Elasticsearch?

The match query analyzes any provided text before performing a search. This means the match query can search text fields for analyzed tokens rather than an exact term. (Optional, string) Analyzer used to convert the text in the query value into tokens. Defaults to the index-time analyzer mapped for the <field> .

What is the Elasticsearch query to get all documents from an index?

Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids. The results from this would contain a _scroll_id which you have to query to get the next 100 chunk. This answer needs more updates. search_type=scan is now deprecated.

What is match phrase in Elasticsearch?

Match phrase queryeditA phrase query matches terms up to a configurable slop (which defaults to 0) in any order. Transposed terms have a slop of 2. The analyzer can be set to control which analyzer will perform the analysis process on the text.


2 Answers

I finally got it to work but not with a direct method!

This is what I do:

  • Create a clean list of words from the source query, by:
    • change to lower case
    • replacing any special chars and punctuation with space
    • remove duplicate words
  • Search using normal match with OR operator for the words joined as a string
  • Now we will find the best relevant hits in result
  • We take those hits one by one and do a word to word search in php (or whatever programming language you use)
  • This word search will check for all the words of a document from the hits we just found, and match them with the words in source query; such that all words from hit document should be present in the source query string

This worked for me well enough!

Unless someone has a direct method from elasticsearch query language.

like image 172
Imran Ahmed Avatar answered Oct 06 '22 17:10

Imran Ahmed


The Percolate query should help here. You'd register your documents as queries, making "Nike Free Sparq Mens White" a match query with an AND operator.

Then your query can become a document like one having "Nike Free Sparq 09 - Mens - White/Black/Varsity Red" as content. You should get "Nike Free Sparq Mens White" back, because it matches all terms.

Unfortunately, this won't scale well (e.g. if you have millions of documents, it might get slow).

like image 35
Radu Gheorghe Avatar answered Oct 06 '22 16:10

Radu Gheorghe