Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Google Cloud Dataflow v/s Google Cloud Data Fusion

I recently saw that there is a new tool in GCP known as Data Fusion and looking at it, it seems like it is an easier way of creating ETL pipelines as compared to Dataflow. So can we assume that it is a replacement for Dataflow?

like image 606
rish0097 Avatar asked Jul 09 '19 07:07

rish0097


2 Answers

Data Fusion is not a replacement for Dataflow but rather a complementary. It enables Hybrid integration because it is based on an open-source alternative called CDAP. It also has additional metadata and lineage features that are not currently available in Dataflow

like image 113
Eslam Nawara Avatar answered Nov 15 '22 14:11

Eslam Nawara


Cloud data fusion is based on CDAP an open source pipeline development tool. which offers visualization tool to build ETL/ELT pipelines. it supports major Hadoop distributions(MapR, Harotonworks)and Cloud (AWS, GCP,AZURE) to build pipeline. in GCP it uses cloud dataproc cluster to perform jobs and comes up with multiple prebuilt connectors from to connect source to sink. it gives you codeless pipeline development. data fusion is also enterprise ready gives data lineage, metadata management.

How ever Dataflow is fully managed service in GCP based on Apache Beam offers unified programming model to develop pipeline that can execute on a wide range of data processing patterns including ETL, batch computation, and continuous computation. same code can handle batch and realtime processing and has lot of choice to choose the runner for pipeline deployment.

like image 26
abhay Avatar answered Nov 15 '22 15:11

abhay