We have an application running on Google App Engine using Datastore as persistence back-end. Currently application has mostly 'OLTP' features and some rudimentary reporting. While implementing reports we experienced that processing large amount of data (millions of objects) is very difficult using Datastore and GQL. To enhance our application with proper reports and Business Intelligence features we think its better to setup a ETL process to move data from Datastore to BigQuery.
Initially we thought of implementing the ETL process as App Engine cron job but it looks like Dataflow can also be used for this. We have following requirements for setting up the process
My Questions are
Question is are these two approaches doable? which one is better cost wise? Is there any other way which is better than above two?
Thank you,
rizTaak
Yes! Here is what you need to know: when you write an Apache Beam pipeline, your processing logic lives in DoFn that you create. These functions can call any logic you want.
BigQuery is a serverless, scalable cloud-based data warehouse provided by Google Cloud Platform. It is a fully managed warehouse that allows users to perform ETL on the data with the help of SQL queries.
Dataflows allow setting up a complete self-service ETL, that lets teams across an organization not only ingest data from a variety of sources such as Salesforce, SQL Server, Dynamics 365, etc. but also convert it into an analysis-ready form.
Dataflow can absolutely be used for this purpose. In fact, Dataflow's scalability should make the process fast and relatively easy.
Both of your approaches should work -- I'd give a preference to the second one of using a batch pipeline to move the existing data, and then a streaming pipeline to handle new data via Cloud Pub/Sub. In addition to the data movement, Dataflow allow arbitrary analytics/manipulation to be performed on the data itself.
That said, BigQuery and Datastore can be connected directly. See, for example, Loading Data From Cloud Datastore in BigQuery documentation.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With