I'd like to play around with Google Cloud Pub/Sub and processing messages in Dataflow. Are there any public data feeds in Pub/Sub that I can use to get started?
In the Dataflow WordCount example, input is read from a file in Cloud Storage, gs://dataflow-samples/shakespeare/kinglear.txt. It seems that dataflow-samples is accessible to all projects, which is very convenient for getting started. Is there anything similar for Pub/Sub?
Google Cloud Pub/Sub provides messaging between applications. Cloud Pub/Sub is designed to provide reliable, many-to-many, asynchronous messaging between applications. Publisher applications can send messages to a "topic" and other applications can subscribe to that topic to receive the messages.
A topic is an entity containing the message itself, along with additional subscriber and publisher information, and it is stored in the cache. It contains the message store, which stores the actual data objects being published by the publisher in a queue.
Currently, Google maintains this public topic projects/pubsub-public-data/topics/taxirides-realtime as part of a Cloud Dataflow code lab.
You can find more information on how to use it here.
Additionally, you can use Dataflow with BigQuery. Google provides this comprehensive set of public data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With