Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What is the difference between GCP Kubeflow and GCP cloud composer?

I am learning GCP, and came across Kuberflow and Google Cloud Composer.
From what I have understood, it seems that both are used to orchestrate workflows, empowering the user to schedule and monitor pipelines in the GCP.
The only difference that I could figure out is that Kuberflow deploys and monitors Machine Learning models. Am I correct? In that case, since Machine Learning models are also objects, can't we orchestrate them using Cloud Composer? How does Kubeflow help in any way, better than Cloud Composer when it comes to managing Machine Learning models??

Thanks

like image 974
Nizam Avatar asked Jan 26 '23 06:01

Nizam


1 Answers

Kubeflow and Kubeflow Pipelines

Kubeflow is not exactly the same as Kubeflow Pipelines. The Kubeflow project mostly develops Kubernetes operators for distributed ML training (TFJob, PyTorchJob). On the other hand the Pipelines project develops a system for authoring and running pipelines on Kubernetes. KFP also has some sample components, by the main product is the pipeline authoring SDK and the pipeline execution engine

Kubeflow Pipelines vs. Cloud Composer

The projects are pretty similar, but there are differences:

  • KFP use Argo for execution and orchestration. Cloud Composer uses Apache Airflow.
  • KFP/Argo is designed for distributed execution on Kubernetes. Cloud Composer/Apache Airflow are more for single-machine execution.
  • KFP/Argo are language-agnostic - components can use any language (components describe containerized command-line programs). Cloud Composer/Apache Airflow use Python (Airflow operators are defined as Python classes).
  • KFP/Argo have concept of data passing. Every component has inputs and outputs and pipleine connects them into a data passing graph. Cloud Composer/Apache Airflow do not really have data passing (Airflow has global variable storage and XCom, but it's not the same thing as explicit data passing) and the pipeline is a task dependency graph rather than mostly data dependency graph (KFP can also have task dependencies, but usually they're not needed).
  • KFP supports execution caching feature that skips execution of tasks that have already been executed before.
  • KFP records all artifacts produced by pipeline runs in ML Metadata database.
  • KFP has experimental adapter which allows using Airflow operators as components.
  • KFP has large fast-growing ecosystem of custom components.
like image 69
Ark-kun Avatar answered Feb 11 '23 23:02

Ark-kun