Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google PubSub : How to customize distribution of messages to consumers?

I have a scenario where we will be sending customer data to pubsub and consume it with java subscribers. I have multiple subscribers subscribed to same subscription. Is there a way to route all messages of same customerID to same subscriber ?

I know Google Dataflow has session based windowing. However, I wanted to know if we can achieve it using simple java consumers.

like image 551
rhg Avatar asked Feb 04 '23 21:02

rhg


1 Answers

Update June 2020: Filtering is now an available feature in Google Cloud Pub/Sub. When creating a subscription, one can specify a filter that looks at message attributes. If a message does not match the filter, the Pub/Sub service automatically acknowledges the message without delivering it to the subscriber.

In this case, you would need to have different subscriptions and each subscriber would consume messages from one of the subscriptions. Each subscription would have a filter set up to match the customer ID. If you know the list of customer IDs and it is short, you would set up an exact match filter for each customer ID, e.g.,

attribute.customerID = "customerID1"

If you have a lot of customer IDs and wanted to partition the set of IDs received by each subscriber, you could use the prefix operator to do so. For example, if the IDs are numbers, you could have filters such as:

hasPrefix(attribute.customerID, "0")
hasPrefix(attribute.customerID, "1")
hasPrefix(attribute.customerID, "2")
hasPrefix(attribute.customerID, "3")
...
hasPrefix(attribute.customerID, "9")

Previous answer:

At this time, Google Cloud Pub/Sub has no way to filter messages delivered to particular subscribers, no. If you know a priori the number of subscribers you have, you could to it yourself, though. You could create as many topics as you have subscribers and then bucket customer IDs into different topics, publishing messages to the right topic for each customer ID. You'd create a single subscription on each topic and each subscriber would receive messages from one of these subscriptions.

The disadvantage is that if you have any subscribers that want the data for all customer IDs, then you'll have to have an additional subscription on each topic and that subscriber will have to get messages from all of those subscriptions.

Keep in mind that you won't want to create more than 10,000 topics or else you may run up against quotas.

like image 166
Kamal Aboul-Hosn Avatar answered Feb 13 '23 05:02

Kamal Aboul-Hosn