Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to make idempotent aggregation in Cloud Functions?

I'm working on a Firebase Cloud Function that updates some aggregate information on some documents in my DB. It's a very simple function and is simply adding 1 to a total # of documents count. Much like the example function found in the Firestore documentation.

I just noticed that when creating a single new document, the function was invoked twice. See below screenshot and note the logged document ID (iDup09btyVNr5fHl6vif) is repeated twice:

enter image description here

After a bit of digging around I found this SO post that says the following:

Delivery of function invocations is not currently guaranteed. As the Cloud Firestore and Cloud Functions integration improves, we plan to guarantee "at least once" delivery. However, this may not always be the case during beta. This may also result in multiple invocations for a single event, so for the highest quality functions ensure that the functions are written to be idempotent.

(From Firestore documentation: Limitations and guarantees)

Which leads me to a problem with their documentation. Cloud Functions as mentioned above are meant to be idempotent (In other words, data they alter should be the same whether the function runs once or runs multiple times). However the example function I linked to earlier (to my eyes) is not idempotent:

exports.aggregateRatings = firestore
  .document('restaurants/{restId}/ratings/{ratingId}')
  .onWrite(event => {
    // Get value of the newly added rating
    var ratingVal = event.data.get('rating');

    // Get a reference to the restaurant
    var restRef = db.collection('restaurants').document(event.params.restId);

    // Update aggregations in a transaction
    return db.transaction(transaction => {
      return transaction.get(restRef).then(restDoc => {
        // Compute new number of ratings
        var newNumRatings = restDoc.data('numRatings') + 1;

        // Compute new average rating
        var oldRatingTotal = restDoc.data('avgRating') * restDoc.data('numRatings');
        var newAvgRating = (oldRatingTotal + ratingVal) / newNumRatings;

        // Update restaurant info
        return transaction.update(restRef, {
          avgRating: newAvgRating,
          numRatings: newNumRatings
        });
      });
    });
});

If the function runs once, the aggregate data is increased as if one rating is added, but if it runs again on the same rating it will increase the aggregate data as if there were two ratings added.

Unless I'm misunderstanding the concept of idempotence, this seems to be a problem.

Does anyone have any ideas of how to increase / decrease aggregate data in Cloud Firestore via Cloud Functions in a way that is idempotent?

(And of course doesn't involve querying every single document the aggregate data is regarding)

Bonus points: Does anyone know if functions will still need to be idempotent after Cloud Firestore is out of beta?

like image 533
saricden Avatar asked Mar 11 '18 05:03

saricden


People also ask

How do you create an idempotent function?

To make a function idempotent, it must first identify that an event has already been processed. Therefore, it must extract a unique identifier, called an “idempotency key”.

What is idempotent function?

In computer science, this refers to the notion of idempotence, meaning that operation results remain unchanged when an operation is applied more than once. Likewise, a function is considered idempotent if an event results in the desired outcome even if the function is invoked multiple times for a given event.

What is cold start in cloud functions?

Functions are stateless, and the execution environment is often initialized from scratch, which is called a cold start. Cold starts can take significant amounts of time to complete.


1 Answers

The Cloud Functions documentation gives some guidance on how to make retryable background functions idempotent. The bullet point you're most likely to be interested in here is:

Impose a transactional check outside the function, independent of the code. For example, persist state somewhere recording that a given event ID has already been processed.

The event parameter passed to your function has an eventId property on it that is unique, but will be the same when an even it retried. You should use this value to determine if an action taken by an event has already occurred, so you know to skip the action the second time, if necessary.

As for how exactly to check if an event ID has already been processed by your function, there's a lot of ways to do it, and that's up to you.

You can always opt out of making your function idempotent if you think it's simply not worthwhile, or it's OK to possibly have incorrect counts in some (probably rare) cases.

like image 55
Doug Stevenson Avatar answered Oct 25 '22 06:10

Doug Stevenson