Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build semantic search for a given domain

Tags:

People also ask

How is semantic search engine implemented?

The process of search can be broken down into 4 steps: Query autocompletion — Suggest query based on first characters typed. Query filtering — Token removal, stemming and lowering. Query augmentation — Adding synonyms and acronym contraction/expansion.

What is semantic search?

Semantic search is a data searching technique in a which a search query aims to not only find keywords, but to determine the intent and contextual meaning of the the words a person is using for search. Semantics refer to the philosophical study of meaning.

What is semantic search in NLP?

In machine learning, semantic search captures the meaning from inputs of words such as sentences, paragraphs, and more. It implements NLP techniques to understand and process large amounts of text and speech data. This is the pre-processing data stage called text processing.


There is a problem we are trying to solve where we want to do a semantic search on our set of data, i.e we have a domain-specific data (example: sentences talking about automobiles)

Our data is just a bunch of sentences and what we want is to give a phrase and get back the sentences which are:

  1. Similar to that phrase
  2. Has a part of a sentence that is similar to the phrase
  3. A sentence which is having contextually similar meanings


Let me try giving you an example suppose I search for the phrase "Buying Experience", I should get the sentences like:

  • I never thought car buying could take less than 30 minutes to sign and buy.
  • I found a car that i liked and the purchase process was
    straightforward and easy

  • I absolutely hated going car shopping, but today i’m glad i did


I want to lay emphasis on the fact that we are looking for contextual similarity and not just a brute force word search.

If the sentence uses different words then also it should be able to find it.

Things that we have already tried:

  1. Open Semantic Search the problem we faced here is generating ontology from the data we have, or for that sake searching for available ontology from different domains of our interest.

  2. Elastic Search(BM25 + Vectors(tf-idf)), we tried this where it gave a few sentences but precision was not that great. The accuracy was bad as well. We tried against a human-curated dataset, it was able to get around 10% of the sentences only.

  3. We tried different embeddings like the once mentioned in sentence-transformers and also went through the example and tried evaluating against our human-curated set and that also had very low accuracy.

  4. We tried ELMO. This was better but still lower accuracy than we expected and there is a cognitive load to decide the cosine value below which we shouldn't consider the sentences. This even applies to point 3.

Any help will be appreciated. Thanks a lot for the help in advance