Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

simple in memory full text search solution

I have a small website running on Java with probably a dozen of markdown files. I want to provide a full text search for user to quickly access those markdown files. Since it is small I can afford building up the index in memory each time the web app is started. Any suggestion?

Note

  1. I would like to stay away from any database solution, sql or nosql.

  2. I prefer the solution is provided as a library rather than builting into an XX framework

like image 503
Gelin Luo Avatar asked Jan 27 '13 02:01

Gelin Luo


People also ask

How do you implement full-text search?

To implement a full-text search in a SQL database, you must create a full-text index on each column you want to be indexed. In MySQL, this would be done with the FULLTEXT keyword. Then you will be able to query the database using MATCH and AGAINST.

How do I enable full-text search in SQL?

SQL Server databases are full-text enabled by default. Before you can run full-text queries, however, you must create a full text catalog and create a full-text index on the tables or indexed views you want to search.

How do I verify a full-text search in SQL Server?

How can I tell if Full-Text Search is enabled on my SQL Server instance? A: You can determine if Full-Text Search is installed by querying the FULLTEXTSERVICEPROPERTY like you can see in the following query. If the query returns 1 then Full-Text Search is enabled.

What is full-text search?

A full-text search is a comprehensive search method that compares every word of the search request against every word within the document or database.


3 Answers

Use one of the in-memory databases, either H2 or HSQLDB. Then, for the full text search part, just use Hibernate Search. It will work with either of the two DBs and it will keep you from having to deal with Lucene at all: you can just annotate your entities, and go: all the indexing will happen automatically, and if you want to do things like boost fields, you can do that with a simple annotation.

like image 124
Rob Avatar answered Sep 21 '22 16:09

Rob


As a side project I have implemented a simple in memory text search solution for java.

https://github.com/bradforj287/SimpleTextSearch

Key Features:

  • Inverted Index
  • Cosine Similarity algorithm w/ TFIDF ranking
  • MultiThreadded index creation and searching
  • Word Stemming (snowball stemmer)
  • Strips HTML tags automatically
  • Stop words
  • String tokenizer (Stanford NLP)

Might want to take a look.

like image 23
bradforj287 Avatar answered Sep 18 '22 16:09

bradforj287


Drop in Apache Lucene, the more-or-less gold standard in full-text search. It is happy to operate in memory.

like image 24
bmargulies Avatar answered Sep 19 '22 16:09

bmargulies