Firebase recently released integration to Cloud Functions that allows us to upload Javascript functions to run without needing our own servers.
Is it possible to build a search engine using those functions? My idea is to use local disk (tmpfs volume) to keep indexed data in memory and for each write event I would index the new data. Does tmpfs keeps data between function calls (instances)?
Can cloud functions be used for this purpose or should I use a dedicated server for indexing data?
Another question related to this is: when cloud functions get data from Firebase Realtime Database, does it consumes network or just disk reading? How is it computed in princing?
Thanks
You could certainly try that. Cloud Functions have a local file system that typically is used to maintain state during a run. See this answer for more: Write temporary files from Google Cloud Function
But there are (as far as I know) no guarantees that state will be maintained between runs of your function. Or even that the function will be running on the same container next time. You may be running on a newly created container next time. Or when there's a spike in invocations, your function may be running on multiple containers at once. So you'd potentially have to rebuild the search index for every run of your function.
I would instead look at integrating an external dedicated search engine, such as Algolia in this example: https://github.com/firebase/functions-samples/tree/master/fulltext-search. Have a look at the code: even with comments and license it's only 55 lines!
Alternatively you could find a persistent storage service (Firebase Database and Firebase Storage being two examples) and use that to persist the search index. So you'd run the code to update the search index in Cloud Functions, but would store the resulting index files in a more persistent location.
GCF team member + former google search member. Cloud Functions would not be suitable for in-memory search engines for a few reasons.
A search engine is very wise to separate its indexing and serving machines. At scale you'll want to worry about read and write hot-spotting differently.
As Frank alluded to, you're not guaranteed to get the same instance across multiple requests. I'd like to strengthen his concern: you will never get the same instance across two different Cloud Functions. Each Cloud Function has its own backend infrastructure that is provisioned and scaled independently.
I know it's tempting to cut dependencies, but cutting out persistent storage isn't the way. Your serving layer can use caching to speed up requests, but durable storage makes sure you don't have to reindex the whole corpus if your Cloud Function crashes or you deploy an update (each guarantees the whole instance is scrapped and recreated).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With