Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Firestore: Query on substring of a property value (text search)

I agree with @Kuba's answer, But still, it needs to add a small change to work perfectly for search by prefix. here what worked for me

For searching records starting with name queryText

collectionRef.where('name', '>=', queryText).where('name', '<=', queryText+ '\uf8ff').

The character \uf8ff used in the query is a very high code point in the Unicode range (it is a Private Usage Area [PUA] code). Because it is after most regular characters in Unicode, the query matches all values that start with queryText.


There's no such operator, allowed ones are ==, <, <=, >, >=.

You can filter by prefixes only, for example for everything that starts between bar and foo you can use

collectionRef.where('name', '>=', 'bar').where('name', '<=', 'foo')

You can use external service like Algolia or ElasticSearch for that.


While Kuba's answer is true as far as restrictions go, you can partially emulate this with a set-like structure:

{
  'terms': {
    'reebok': true,
    'mens': true,
    'tennis': true,
    'racket': true
  }
}

Now you can query with

collectionRef.where('terms.tennis', '==', true)

This works because Firestore will automatically create an index for every field. Unfortunately this doesn't work directly for compound queries because Firestore doesn't automatically create composite indexes.

You can still work around this by storing combinations of words but this gets ugly fast.

You're still probably better off with an outboard full text search.


While Firebase does not explicitly support searching for a term within a string,

Firebase does (now) support the following which will solve for your case and many others:

As of August 2018 they support array-contains query. See: https://firebase.googleblog.com/2018/08/better-arrays-in-cloud-firestore.html

You can now set all of your key terms into an array as a field then query for all documents that have an array that contains 'X'. You can use logical AND to make further comparisons for additional queries. (This is because firebase does not currently natively support compound queries for multiple array-contains queries so 'AND' sorting queries will have to be done on client end)

Using arrays in this style will allow them to be optimized for concurrent writes which is nice! Haven't tested that it supports batch requests (docs don't say) but I'd wager it does since its an official solution.


Usage:

collection("collectionPath").
    where("searchTermsArray", "array-contains", "term").get()

Full-Text Search, Relevant Search, and Trigram Search!

UPDATE - 2/17/21 - I created several new Full Text Search Options.

See Fireblog.io for details.


Also, side note, dgraph now has websockets for realtime... wow, never saw that coming, what a treat! Slash Dgraph - Amazing!


--Original Post--

A few notes here:

1.) \uf8ff works the same way as ~

2.) You can use a where clause or start end clauses:

ref.orderBy('title').startAt(term).endAt(term + '~');

is exactly the same as

ref.where('title', '>=', term).where('title', '<=', term + '~');

3.) No, it does not work if you reverse startAt() and endAt() in every combination, however, you can achieve the same result by creating a second search field that is reversed, and combining the results.

Example: First you have to save a reversed version of the field when the field is created. Something like this:

// collection
const postRef = db.collection('posts')

async function searchTitle(term) {

  // reverse term
  const termR = term.split("").reverse().join("");

  // define queries
  const titles = postRef.orderBy('title').startAt(term).endAt(term + '~').get();
  const titlesR = postRef.orderBy('titleRev').startAt(termR).endAt(termR + '~').get();

  // get queries
  const [titleSnap, titlesRSnap] = await Promise.all([
    titles,
    titlesR
  ]);
  return (titleSnap.docs).concat(titlesRSnap.docs);
}

With this, you can search the last letters of a string field and the first, just not random middle letters or groups of letters. This is closer to the desired result. However, this won't really help us when we want random middle letters or words. Also, remember to save everything lowercase, or a lowercase copy for searching, so case won't be an issue.

4.) If you have only a few words, Ken Tan's Method will do everything you want, or at least after you modify it slightly. However, with only a paragraph of text, you will exponentially create more than 1MB of data, which is bigger than firestore's document size limit (I know, I tested it).

5.) If you could combine array-contains (or some form of arrays) with the \uf8ff trick, you might could have a viable search that does not reach the limits. I tried every combination, even with maps, and a no go. Anyone figures this out, post it here.

6.) If you must get away from ALGOLIA and ELASTIC SEARCH, and I don't blame you at all, you could always use mySQL, postSQL, or neo4Js on Google Cloud. They are all 3 easy to set up, and they have free tiers. You would have one cloud function to save the data onCreate() and another onCall() function to search the data. Simple...ish. Why not just switch to mySQL then? The real-time data of course! When someone writes DGraph with websocks for real-time data, count me in!

Algolia and ElasticSearch were built to be search-only dbs, so there is nothing as quick... but you pay for it. Google, why do you lead us away from Google, and don't you follow MongoDB noSQL and allow searches?