Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Error: 10 ABORTED: Too much contention on these documents. Please try again

What does this error mean?

Especially, what do they mean by : Please try again

Does it mean that the transaction failed I have to re-run the transaction manually? From what I understood from the documentation,

The transaction read a document that was modified outside of the transaction. In this case, the transaction automatically runs again. The transaction is retried a finite number of times.

If so, on which documents? The error do not indicate which document it is talking about. I just get this stack:

{ Error: 10 ABORTED: Too much contention on these documents. Please try again. at Object.exports.createStatusErrornode_modules\grpc\src\common.js:87:15) at ClientReadableStream._emitStatusIfDone \node_modules\grpc\src\client.js:235:26) at ClientReadableStream._receiveStatus \node_modules\grpc\src\client.js:213:8) at Object.onReceiveStatus \node_modules\grpc\src\client_interceptors.js:1256:15) at InterceptingListener._callNext node_modules\grpc\src\client_interceptors.js:564:42) at InterceptingListener.onReceiveStatus\node_modules\grpc\src\client_interceptors.js:614:8) at C:\Users\Tolotra Samuel\PhpstormProjects\CryptOcean\node_modules\grpc\src\client_interceptors.js:1019:24 code: 10, metadata: Metadata { _internal_repr: {} }, details: 'Too much contention on these documents. Please try again.' }

To recreate this error, just run a for loop on the db.runTransaction method as indicated on the documentation

like image 672
TSR Avatar asked Sep 14 '18 22:09

TSR


4 Answers

Firestore re-runs the transaction only a finite number of times. As of writing, this number is hard-coded as 5, and cannot be changed. To avoid congestion/contention when many users are using the same document, normally we use the exponential back-off algorithm (but this will result in transactions taking longer to complete, which may be acceptable in some use cases).

However, as of writing, this has not been implemented in the Firebase SDK yet — transactions are retried right away. Fortunately, we can implement our own exponential back-off algorithm in a transaction:

const createTransactionCollisionAvoider = () => {
  let attempts = 0
  return {
    async avoidCollision() {
      attempts++
      await require('delay')(Math.pow(2, attempts) * 1000 * Math.random())
    }
  }
}

…which can be used like this:

// Each time we run a transaction, create a collision avoider.
const collisionAvoider = createTransactionCollisionAvoider()
db.runTransaction(async transaction => {
  // At the very beginning of the transaction run,
  // introduce a random delay. The delay increases each time
  // the transaction has to be re-run.
  await collisionAvoider.avoidCollision()

  // The rest goes as normal.
  const doc = await transaction.get(...)
  // ...
  transaction.set(...)
})

Note: The above example may cause your transaction to take up to 1.5 minutes to complete. This is fine for my use case. You might have to adjust the backoff algorithm for your use case.

like image 86
Thai Avatar answered Nov 07 '22 19:11

Thai


We run into the same problem with the Firebase Firestore database. Even small counters with less then 30 items to cound where running into this issue.

Our solution was not to distribute the counter but to increase the number of tries for the transaction and to add a deffer time for those retries.

The first step was to save the transaction action as const witch could be passed to another function.

const taskCountTransaction = async transaction => {
  const taskDoc = await transaction.get(taskRef)

  if (taskDoc.exists) {
    let increment = 0
    if (change.after.exists && !change.before.exists) {
      increment = 1
    } else if (!change.after.exists && change.before.exists) {
      increment = -1
    }

    let newCount = (taskDoc.data()['itemsCount'] || 0) + increment
    return await transaction.update(taskRef, { itemsCount: newCount > 0 ? newCount : 0 })
  }

  return null
}

The second step was to create two helper functions. One for waiting a specifix amount of time and the other one to run the transaction and catch errors. If the abort error with the code 10 occurs we just run the transaction again for a specific amount of retries.

const wait = ms => { return new Promise(resolve => setTimeout(resolve, ms))}


const runTransaction = async (taskCountTransaction, retry = 0) => {
  try {
    await fs.runTransaction(taskCountTransaction)
    return null
  } catch (e) {
    console.warn(e)
    if (e.code === 10) {
      console.log(`Transaction abort error! Runing it again after ${retry} retries.`)

      if (retry < 4) {
        await wait(1000)
        return runTransaction(taskCountTransaction, ++retry)
      }
    }
  }
}

Now that we have all we need we can just call our helper function with await and our transaction call will run longer then a default one and it will deffer in time.

await runTransaction(taskCountTransaction)

What I like about this solution is that it doesn't mean more or complicated code and that most of the already written code can stay as it is. It also uses more time and resources only if the counter gets to the point that it has to count more items. Othervise the time and resources are the same as if you would have the default transactions.

For scaling up for large amounts of items we can increase eather the amount of retries or the waiting time. Both are also affecting the costs for Firebase. For the waiting part we also need to increase the timeout for our function.

DISCLAIMER: I have not stress tested this code with thousands or more of items. In our specific case the problems started with 20+ items and we need up to 50 items for a task. I tested it with 200 items and the problem did not apear again.

like image 25
Tarik Huber Avatar answered Nov 07 '22 19:11

Tarik Huber


The transaction does run several times if needed, but if the values read continue to be updated before the write or writes can occur it will eventually fail, thus the documentation noting the transaction is retried a finite number of times. If you have a value that is updating frequently like a counter, consider other solutions like distributed counters. If you'd like more specific suggestions, I recommend you include the code of your transaction in your question and some information about what you're trying to achieve.

like image 23
Jen Person Avatar answered Nov 07 '22 19:11

Jen Person


I have implemented a simple back-off solution to share : maintain a global variable that assigns a different "retry slot" to each failed connection. For example if 5 connections came at the same time and 4 of them got a contention error, each would get a delay of 500ms, 1000ms, 1500ms, 2000ms until trying again, for example. So it could potentially all resolved at the same time without any more contention.

My transaction is a response of calling Firebase Functions. Each Functions computer instance could have a global variable nextRetrySlot that is preserved until it is shut down. So if error.code === 10 is caught for contention issue, the delay time can be (nextRetrySlot + 1) * 500 then you could for example nextRetrySlot = (nextRetrySlot + 1) % 10 so next connections get a different time round-robin in 500ms ~ 5000ms range.

Below are some benchmarks :

My situation is that I would like each new Firebase Auth registration to get a much shorter ID derived from unique Firebase UID, thus it has risk of collision.

My solution is simply to check all registered short ID and if the query returns something, just generate an another one until it is not. Then we register this new short ID to the database. So the algorithm cannot rely on only Firebase UID, but it is able to "move to the next one" in a deterministic way. (not just random again).

This is my transaction, it first read a database of all used short ID then write a new one atomically, to prevent an extremely unlikely event that 2 new registers came at the same time, with a different Firebase UID that derived into the same short ID, and both see that the short ID is vacant at the same time.

I run a test that intentionally register 20 different Firebase UIDs which all derived into the same short ID. (extremely unlikely situation) All that runs in burst at the same time. First I tried using the same delay on next retry, so I expect it to clash with each other again and again while slowly resolving some connections.

  • Same 500ms delay on retry : 45000ms ~ 60000ms
  • Same 1000ms delay on retry : 30000ms ~ 49000ms
  • Same 1500ms delay on retry : 43000ms ~ 49000ms

Then with distributed delay time in slots :

  • 500ms * 5 slots on retry : 20000ms ~ 31000ms
  • 500ms * 10 slots on retry : 22000ms ~ 23000ms
  • 500ms * 20 slots on retry : 19000ms ~ 20000ms
  • 1000ms * 5 slots on retry : ~29000ms
  • 1000ms * 10 slots on retry : ~25000ms
  • 1000ms * 20 slots on retry : ~26000ms

Confirming that different delay time definitely helps.

like image 20
5argon Avatar answered Nov 07 '22 20:11

5argon