Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Semantic search with NLP and elasticsearch

Tags:

search

nlp

I am experimenting with elasticsearch as a search server and my task is to build a "semantic" search functionality. From a short text phrase like "I have a burst pipe" the system should infer that the user is searching for a plumber and return all plumbers indexed in elasticsearch.

Can that be done directly in a search server like elasticsearch or do I have to use a natural language processing (NLP) tool like e.g. Maui Indexer. What is the exact terminology for my task at hand, text classification? Though the given text is very short as it is a search phrase.

like image 828
user1089363 Avatar asked Jan 07 '12 20:01

user1089363


People also ask

Does Elasticsearch support semantic search?

Semantic Search, a form of search usually used in search engines, serves content to the users understanding the intent and meaning of the user's search query. This search is a step ahead of the traditional text and keyword match search.

Does Elasticsearch use NLP?

By integrating with one of the most popular formats for building NLP models in PyTorch models, Elasticsearch can provide a platform that works with a large variety of NLP tasks and use cases.

What is semantic search NLP?

In machine learning, semantic search captures the meaning from inputs of words such as sentences, paragraphs, and more. It implements NLP techniques to understand and process large amounts of text and speech data. This is the pre-processing data stage called text processing.

What type of search is Elasticsearch?

Elasticsearch is a distributed, open-source search and analytics engine built on Apache Lucene and developed in Java. It started as a scalable version of the Lucene open-source search framework then added the ability to horizontally scale Lucene indices.


1 Answers

There may be several approaches with different implementation complexity.

The easiest one is to create list of topics (like plumbing), attach bag of words (like "pipe"), identify search request by majority of keywords and search only in specified topic (you can add field topic to your elastic search documents and set it as mandatory with + during search).

Of course, if you have lots of documents, manual creation of topic list and bag of words is very time expensive. You can use machine learning to automate some of tasks. Basically, it is enough to have distance measure between words and/or documents to automatically discover topics (e.g. by data clustering) and classify query to one of these topics. Mix of these techniques may also be a good choice (for example, you can manually create topics and assign initial documents to them, but use classification for query assignment). Take a look at Wikipedia's article on latent semantic analysis to better understand the idea. Also pay attention to the 2 linked articles on data clustering and document classification. And yes, Maui Indexer may become good helper tool this way.

Finally, you can try to build an engine that "understands" meaning of the phrase (not just uses terms frequency) and searches appropriate topics. Most probably, this will involve natural language processing and ontology-based knowledgebases. But in fact, this field is still in active research and without previous experience it will be very hard for you to implement something like this.

like image 65
ffriend Avatar answered Oct 07 '22 04:10

ffriend