Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Http get/post into dataflow

I am trying to get some data into dataflow, but the data is not located on cloud storage - it is an rss feed that I would normally check every x hours. Is there a way to do that directly using the SDK or do I have to get the files onto cloud storage some other way first.

Thanks in advance.

like image 598
billy1380 Avatar asked May 02 '26 20:05

billy1380


1 Answers

Dataflow doesn't provide a source for an RSS feed.

You could issue HTTP requests from a ParDo to fetch the data though. For example suppose the feed allowed you to fetch messages in some time range. Then you could create an input collection where each record represented a range of time (e.g. an hour). You could then write a ParDo which would fetch the messages in that time range and emit them.

If you are part of the streaming early access preview then one solution would be to write an App Engine App (or equivalent) which checked the RSS feed every X hours and then published the data using Google Cloud PubSub. You could then use PubSubIO to read those events in Dataflow.

like image 195
Jeremy Lewi Avatar answered May 04 '26 19:05

Jeremy Lewi



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!