Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the fastest way to write a lot of documents to Firestore?

I need to write a large number of documents to Firestore.

What is the fastest way to do this in Node.js?

like image 388
Frank van Puffelen Avatar asked Nov 17 '19 03:11

Frank van Puffelen


People also ask

How many documents can firestore handle?

There is no limitation on the number of documents in Firestore collection but it has a limitation of the size of the document. The maximum size of a document is roughly 1 MiB (1,048,576 bytes).

What is the maximum documents that are write by per transaction or batch of write in cloud firestore?

Each transaction or batch of writes can write to a maximum of 500 documents.

Is firestore faster?

Cloud Firestore also features richer, faster queries and scales further than the Realtime Database. Realtime Database is Firebase's original database. It's an efficient, low-latency solution for mobile apps that require synced states across clients in realtime.


1 Answers

TL;DR: The fastest way to perform bulk date creation on Firestore is by performing parallel individual write operations.

Writing 1,000 documents to Firestore takes:

  1. ~105.4s when using sequential individual write operations
  2. ~ 2.8s when using (2) batched write operations
  3. ~ 1.5s when using parallel individual write operations

There are three common ways to perform a large number of write operations on Firestore.

  1. Perform each individual write operation in sequence.
  2. Using batched write operations.
  3. Performing individual write operations in parallel.

We'll investigate each in turn below, using an array of randomized document data.


Individual sequential write operations

This is the simplest possible solution:

async function testSequentialIndividualWrites(datas) {   while (datas.length) {     await collection.add(datas.shift());   } } 

We write each document in turn, until we've written every document. And we wait for each write operation to complete before starting on the next one.

Writing 1,000 documents takes about 105 seconds with this approach, so the throughput is roughly 10 document writes per second.


Using batched write operations

This is the most complex solution.

async function testBatchedWrites(datas) {   let batch = admin.firestore().batch();   let count = 0;   while (datas.length) {     batch.set(collection.doc(Math.random().toString(36).substring(2, 15)), datas.shift());     if (++count >= 500 || !datas.length) {       await batch.commit();       batch = admin.firestore().batch();       count = 0;     }   } } 

You can see that we create a BatchedWrite object by calling batch(), fill that until its maximum capacity of 500 documents, and then write it to Firestore. We give each document a generated name that is relatively likely to be unique (good enough for this test).

Writing 1,000 document takes about 2.8 seconds with this approach, so the throughput is roughly 357 document writes per second.

That's quite a bit faster than with the sequential individual writes. In fact: many developers use this approach because they assume it is fastest, but as the results above already showed this is not true. And the code is by far the most complex, due to the size constraint on batches.


Parallel individual write operations

The Firestore documentation says this about the performance for adding lots of data:

For bulk data entry, use a server client library with parallelized individual writes. Batched writes perform better than serialized writes but not better than parallel writes.

We can put that to the test with this code:

async function testParallelIndividualWrites(datas) {   await Promise.all(datas.map((data) => collection.add(data))); } 

This code kicks of the add operations as fast as it can, and then uses Promise.all() to wait until they're all finished. With this approach the operations can run in parallel.

Writing 1,000 document takes about 1.5 seconds with this approach, so the throughput is roughly 667 document writes per second.

The difference isn't nearly as great as between the first two approaches, but it still is over 1.8 times faster than batched writes.


A few notes:

  • You can find the full code of this test on Github.
  • While the test was done with Node.js, you're likely to get similar results across all platforms that the Admin SDK supports.
  • Don't perform bulk inserts using client SDKs though, as the results may be very different and much less predictable.
  • As usual the actual performance depends on your machine, the bandwidth and latency of your internet connection, and many other factors. Based on those you may see differences in the differences too, although I expect the ordering to remain the same.
  • If you have any outliers in your own tests, or find completely different results, leave a comment below.
  • Batched writes are atomic. So if you have dependencies between the documents and all documents must be written, or none of them must be written, you should use a batched write.
like image 95
Frank van Puffelen Avatar answered Sep 28 '22 14:09

Frank van Puffelen