Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How Do You "Permanently" Delete An Experiment In Mlflow?

Tags:

python

mlflow

Permanent deletion of an experiment isn't documented anywhere. I'm using Mlflow w/ backend postgres db

Here's what I've run:

client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)

This deletes the the experiment, but when I run a new experiment with the same name as the one I just deleted, it will return this error:

mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the  experiment to create a new one.

I cannot find anywhere in the documentation that shows how to permanently delete everything.

like image 214
Riley Hun Avatar asked Feb 06 '20 06:02

Riley Hun


People also ask

What is experiment in MLflow?

Experiments let you visualize, search for, and compare runs, as well as download run artifacts and metadata for analysis in other tools. An MLflow run corresponds to a single execution of model code.

What can MLflow tracking record?

For example, you can record images (for example, PNGs), models (for example, a pickled scikit-learn model), and data files (for example, a Parquet file) as artifacts. You can record runs using MLflow Python, R, Java, and REST APIs from anywhere you run your code.

How do you log data on MLflow?

By default, the MLflow Python API logs runs locally to files in an mlruns directory wherever you ran your program. You can then run mlflow ui to see the logged runs. To log runs remotely, set the MLFLOW_TRACKING_URI environment variable to a tracking server's URI or call mlflow. set_tracking_uri() .

Is MLflow open source?

With around 60K downloads per day, 8K stars on GitHub — MLflow is an open-source tool originally launched by Databricks that has gained great popularity since its launch in 2018. It helps data scientists manage multiple stages of the Machine Learning lifecycle.


3 Answers

Unfortunately it seems there is no way to do this via the UI or CLI at the moment :-/

The way to do it depends on the type of backend file store that you are using.

Filestore:

If you are using the filesystem as a storage mechanism (the default) then it is easy. The 'deleted' experiments are moved to a .trash folder. You just need to clear that out:

rm -rf mlruns/.trash/*

As of the current version of the documentation (1.7.2), they remark:

It is recommended to use a cron job or an alternate workflow mechanism to clear .trash folder.

SQL Database:

This is more tricky, as there are dependencies that need to be deleted. I am using MySQL, and these commands work for me:

USE mlflow_db;  # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
    )
);
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
like image 114
Lee Netherton Avatar answered Oct 06 '22 04:10

Lee Netherton


As of mlflow 1.11.0, the recommended way to permanently delete runs within an experiment is: mlflow gc [OPTIONS].

From the documentation, mlflow gc will

Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.

like image 33
Moore Avatar answered Oct 06 '22 04:10

Moore


I am adding SQL commands if you want to delete permanently Trash of MLFlow if you are using PostgreSQL as backend storage.

Change to your MLFlow Database, e.g. by using: \c mlflow and then:

DELETE FROM experiment_tags WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM metrics WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM tags WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs WHERE experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
    )
);
DELETE FROM params WHERE run_uuid=ANY(
    SELECT run_uuid FROM runs where experiment_id=ANY(
        SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
    SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';

The difference is, that I added the 'params' Table SQL Delete command there.

like image 26
Dominik Franek Avatar answered Oct 06 '22 03:10

Dominik Franek