I am working on a django application with sqlite3 database, that has a fixed database content. By fixed I mean the content of the db won't change over time. The model is something like this:
class QScript(models.Model):
ch_no = models.IntegerField()
v_no = models.IntegerField()
v = models.TextField()
There are around 6500 records in the table. Given a text that may have some words missing, or some words misspelled, I need to determine its ch_no
and v_no
. For example, if there is a v
field in db with text "This is an example verse", a given text like "This an egsample verse" should give me the ch_no
and v_no
from db. This can be done using Full text search I believe.
can full-text search do this? My guess from what I have studied, it can, as said in sqlite3 page: full-text searches is "what Google, Yahoo, and Bing do with documents placed on the World Wide Web". Cited in SO, I read this article too, along with many others, but didn't find anything that closely matches my requirements.
How to use FTS in django models? I read this but it didn't help. It seems too outdated. Read here that: "...requires direct manipulation of the database to add the full-text index". Searching gives mostly MySQL related info, but I need to do it in sqlite3. So how to do that direct manipulation in sqlite3?
Is my choice of sticking to sqlite3 correct? Or should I use something different (like haystack+elasticsearch as said by Alex Morozov)? My db will not grow any larger, and I have studied that for small sized db, sqlite is almost always better (my situation matches the fourth in sqlite's when to use checklist).
Overview of FTS5. FTS5 is an SQLite virtual table module that provides full-text search functionality to database applications. In their most elementary form, full-text search engines allow the user to efficiently search a large collection of documents for the subset that contain one or more instances of a search term.
SQLite Python: Querying Data First, establish a connection to the SQLite database by creating a Connection object. Next, create a Cursor object using the cursor method of the Connection object. Then, execute a SELECT statement. After that, call the fetchall() method of the cursor object to fetch the data.
SQLite's FTS engine is based on tokens - keywords that the search engine tries to match.
A variety of tokenizers are available, but they are relatively simple. The "simple" tokenizer simply splits up each word and lowercases it: for example, in the string "The quick brown fox jumps over the lazy dog", the word "jumps" would match, but not "jump". The "porter" tokenizer is a bit more advanced, stripping the conjugations of words, so that "jumps" and "jumping" would match, but a typo like "jmups" would not.
In short, the SQLite FTS extension is fairly basic, and isn't meant to compete with, say, Google.
As for Django integration, I don't believe there is any. You will likely need to use Django's interface for raw SQL queries, for both creating and querying the FTS table.
I think that while sqlite is an amazing piece of software, its full-text search capabilities are quite limited. Instead you could index your database using Haystack Django app with some backend like Elasticsearch. Having this setup (and still being able to access your sqlite database) seems to me the most robust and flexible way in terms of FTS.
Elasticsearch has a fuzzy search based on the Levenshtein distance (in a nutshell, it would handle your "egsample" queries). So all you need is to make a right type of query:
from haystack.forms import SearchForm
from haystack.generic_views import SearchView
from haystack import indexes
class QScriptIndex(indexes.SearchIndex, indexes.Indexable):
v = indexes.CharField(document=True)
def get_model(self):
return QScript
class QScriptSearchForm(SearchForm):
text_fuzzy = forms.CharField(required=False)
def search(self):
sqs = super(QScriptSearchForm, self).search()
if not self.is_valid():
return self.no_query_found()
text_fuzzy = self.cleaned_data.get('text_fuzzy')
if text_fuzzy:
sqs = sqs.filter(text__fuzzy=text_fuzzy)
return sqs
class QScriptSearchView(SearchView):
form_class = QScriptSearchForm
Update: As long as PostgreSQL has the Levenshtein distance function, you could also leverage it as the Haystack backend as well as a standalone search engine. If you choose the second way, you'll have to implement a custom query expression, which is relatively easy if you're using a recent version of Django.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With