Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to build a free flow search on sql-server database tables?

I am creating a web application similar to quora/stackoverflow that allows users to perform CRUD operations on question bank (question bank is very small, ~500 questions with maximum 5 answers per question) along with search. How can I build a free flow search or auto suggestion functionality on question bank?

Tech Stack:

  1. Restful services for CRUD operations are exposed from Java EE (spring boot) based server hosted on tomcat.
  2. Frontend app is built using React and served from Node server.
  3. Database used is SQL Server.

Question Bank Schema:

questionId: String

tags: [String]

title: String

description: String

answernotes: [String]

applicableJobRole: [Intern, Full Time]

state: [Approved, UnderReview, Obsolete]

difficultyLevel: [Easy, Medium, Hard]

noOfTimesUsed: int

createdBy: user

Requirement:

For my web application, I want to build a free flow search or auto suggestion functionality on question bank (that searches existing questions in questionbank on fields like description, title, answer notes etc). This is to prevent duplicate questions from getting created in the question bank and to help users search similar questions.

For ex. something similar to the image below (snipped from Quora).

Questions:

  1. Is elastic search a good fit for indexing or building a search functionality for such a small data set? or,
  2. Should I build indexes using self created data structure like suffix tree in application server itself?

Are there some other quick to use recommended solutions for building free flow search functionality for such use-case?

Leads/pointers here are appreciated.

Search from Quora

like image 807
Lokesh Agrawal Avatar asked May 10 '18 16:05

Lokesh Agrawal


3 Answers

Should I build indexes using self created data structure like suffix tree in application server itself?

Please don't! This will take a long time and will likely be extremely error prone. Managing your indexes in the application server itself could be a viable solution, but using a library would definitely help you a lot.

While Elasticsearch is a possibility, this amount of data can easily be managed with a simple Lucene index. This decisions also depends on the redundancy you want to ensure for your system. If you are not happy with your one service being unreachable every time your one server is down, you should look at Lucene's index replication features or at using a redundant Elasticsearch installation.

like image 189
Michele Palmia Avatar answered Oct 15 '22 12:10

Michele Palmia


Is elastic search a good fit for indexing or building a search functionality for such a small data set?

Yes it is. I think the size of your data you like to index is not the main issue here. A library or search engine like elasticsearch, solr or lucene have some base functionality which you really use if you wanna implement a user-friendly and state of the art search. Just some use cases you really struggle if you don't use one of these:

  • Language based analysis (stemming)
  • better support for multi-language use cases
  • Fulltext search
  • Auto suggestion with Lhevenstein distance

Should I build indexes using self created data structure like suffix tree in application server itself?

Based on micpalmias answer: he's right. don't do this! use a search engine / library for this. The're design to do this for you, and they do it really well.

like image 1
dom Avatar answered Oct 15 '22 12:10

dom


ElasticSearch and Solr are great full text search servers, and Lucene is a great full text search library that you can use for your scenario. But, did you know SQL Server has full text search capabilities? by using CONTAINS you can search for:

  • A word or phrase.
  • The prefix of a word or phrase.
  • A word near another word.
  • A word inflectionally generated from another (for example, the word drive is the inflectional stem of drives, drove, driving, and driven).
  • A word that is a synonym of another word using a thesaurus (for example, the word "metal" can have synonyms such as "aluminum" and "steel").

To be able to use CONTAINS you need to setup Full Text Search

SQL Server Full Text Search is not as fancy or fast than ElasticSearch or Lucene, but:

  • It's part of SQL Server, so you don't need another moving part.
  • Indexes are automatically updated. With ElasticSearch or Lunece you need to use some mechanism to update the indexes when base tables change.
  • I think it's enough for your scenario.
like image 1
Jesús López Avatar answered Oct 15 '22 12:10

Jesús López