How to prevent duplicated msg from happening in Google Cloud PubSub?
Say, I have a code that handles the msg that it is subscribed for.
Say, I have 2 nodes with the same Service that has this code.
Once one has received the msg but not yet acknowledged it, another node will receive the same message. And this is where there's the problem that we have two duplicated msgs.
void messageReceiver(PubsubMessage pubsubMessage, AckReplyConsumer ackReply) {
submitHandler.handle(toMessage(pubsubMessage))
.doOnSuccess((response) -> {
log.info("Acknowledging the successfully processed message id: {}, response {}", pubsubMessage.getMessageId(), response);
ackReply.ack(); // <---- acknowledged
})
.doOnError((e) -> {
log.error("Not acknowledging due to an exception", e);
ackReply.nack();
})
.doOnTerminate(span::finish)
.subscribe();
}
What is the solution for this? Is it normal behaviour?
Google Cloud Pub/Sub uses "At-Least-Once" delivery. From the docs:
Typically, Cloud Pub/Sub delivers each message once and in the order in which it was published. However, messages may sometimes be delivered out of order or more than once. In general, accommodating more-than-once delivery requires your subscriber to be idempotent when processing messages.
This means it guarantees it will deliver the message 1:N times, so you can potentially get the message multiple times if you don't pipe it through something else that deduplicates it first. There isn't a setting you can define to guarantee exactly once delivery. The docs do reference you can get the behavior you desire using Cloud Dataflow's PubSubIO
, but that solution appears to be deprecated:
You can achieve exactly once processing of Cloud Pub/Sub message streams using Cloud Dataflow
PubsubIO
. PubsubIO de-duplicates messages on custom message identifiers or those assigned by Cloud Pub/Sub.
Saying all of this, I've never actually seen Google Cloud Pub/Sub send a message twice. Are you sure that's really the problem you're having, or is the message being reissued because you are not acknowledging the message within the Acknowledgement Deadline (as you stated above, this defaults to 10 seconds). If you don't acknowledge it, it will get reissued. From the docs (emphasis mine):
A subscription is created for a single topic. It has several properties that can be set at creation time or updated later, including:
- An acknowledgment deadline: If your code doesn't acknowledge the message before the deadline, the message is sent again. The default is 10 seconds. The maximum custom deadline you can specify is 600 seconds (10 minutes).
If that's the situation, just acknowledge your messages within the deadline and you won't see these duplicates as often.
You can use Redis from Memorystore in order to deduplicate messages. Your publisher should add trace iD to the message body just before publishing it to PubSub. On the other side client (subscriber) should check if the trace ID is in the cache - skip the message. If there is no such message - process the message and add trace ID to cache with 7-8 days expiry time (PubSub deadline is 7 days). In such a simple way You can grant the correct messages received.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With