Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Elasticsearch: Can it be used to avoid writing your own NLP? (e.g. Re-invent the wheel)

Here's a simplified example of what I'm trying to achieve - I am sure this is a pretty standard thing and I hope someone can point me in the right direction of a pattern, method, way to do this without re-inventing the wheel.

PUT /test/vendors/1
{
  "type": "clinic",
  "name": "ENT of Boston",
  "place": "Boston"  
}

PUT /test/vendors/2
{
  "type": "law firm",
  "name": "Ambulance Chasers Inc.",
  "place": "Boston"  

}

Say I want to support searches like these:

"Ambulance Chasers"
"Law Firm in Boston"

I can run a search like this:

GET /test/_search
{
  "query": {
    "multi_match" : {
      "query":    "Law Firm in Boston", 
      "fields": [ "type", "place", "name" ],
      "type": "most_fields"
    }
  }
}

he thing is, this would also get me ENT Of Boston because it has Boston in its name, although that's clearly not what I'm looking for.

I know I can write my own code to analyze the search string before it's submitted to Elasticsearch, and force Boston to be searched only in the place field in documents. I can do that to all fields and issue a super pin-pointer search query for EXACTLY what the user needs. But is there an easier way to handle something like that which I am missing?

I guess what I'm asking is whether there's a way Elasticseaarch can allow me to fine tune and "understand" what I'm looking for, without forcing me to dive deep into Natural Language Processing in my own code and re-invent the wheel.

like image 470
JasonGenX Avatar asked Mar 14 '19 15:03

JasonGenX


1 Answers

Elasticsearch "searching" is purely based on keyword searching.

What you get however is some NLP for e.g. retrieving or gather data, extracting the required information, tokenization, stopword removals(all these done by analyzer), similarity calculations (using tf-idf and vector space model).

Further NLP process consists of coming up with a model, training that model, classification of text data etc for which I do not think Elasticsearch has an engine that can do that (There is an implementation called MLT(More Like This) but I'm not sure how it works (haven't read it yet)).

What you can do is use elasticsearch as source for your NLP engine if you end up creating one, again for which, you do not need to implement the basic stages as mentioned above.

You can check this blog which is quite interesting.

Regardless, that being said and done, looking at your use case, I've come up with the below query. I know its not the exact solution but it would give the result you are looking for.

POST <your_index_name>/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "multi_match": {
            "query": "law",
            "fields": [ "type", "place", "name"],
            "type": "most_fields"
          }
        },
        {
          "multi_match": {
            "query": "firm",
            "fields": [ "type", "place", "name"],
            "type": "most_fields"
          }
        },
        {
          "multi_match": {
            "query": "boston",
            "fields": [ "type", "place", "name"],
            "type": "most_fields"
          }
        }
      ]
    }
  }
} 

What I've done is simply made created a must clause for every word using the query you've posted. This would assure you that you don't end up getting the unwanted results you are looking for.

Let me know if it helps!

like image 100
Kamal Avatar answered Nov 17 '22 21:11

Kamal