I recently saw that there is a new tool in GCP known as Data Fusion and looking at it, it seems like it is an easier way of creating ETL pipelines as compared to Dataflow. So can we assume that it is a replacement for Dataflow?
Data Fusion is not a replacement for Dataflow but rather a complementary. It enables Hybrid integration because it is based on an open-source alternative called CDAP. It also has additional metadata and lineage features that are not currently available in Dataflow
Cloud data fusion is based on CDAP an open source pipeline development tool. which offers visualization tool to build ETL/ELT pipelines. it supports major Hadoop distributions(MapR, Harotonworks)and Cloud (AWS, GCP,AZURE) to build pipeline. in GCP it uses cloud dataproc cluster to perform jobs and comes up with multiple prebuilt connectors from to connect source to sink. it gives you codeless pipeline development. data fusion is also enterprise ready gives data lineage, metadata management.
How ever Dataflow is fully managed service in GCP based on Apache Beam offers unified programming model to develop pipeline that can execute on a wide range of data processing patterns including ETL, batch computation, and continuous computation. same code can handle batch and realtime processing and has lot of choice to choose the runner for pipeline deployment.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With