Permanent deletion of an experiment isn't documented anywhere. I'm using Mlflow w/ backend postgres db
Here's what I've run:
client = MlflowClient(tracking_uri=server)
client.delete_experiment(1)
This deletes the the experiment, but when I run a new experiment with the same name as the one I just deleted, it will return this error:
mlflow.exceptions.MlflowException: Cannot set a deleted experiment 'cross-sell' as the active experiment. You can restore the experiment, or permanently delete the experiment to create a new one.
I cannot find anywhere in the documentation that shows how to permanently delete everything.
Experiments let you visualize, search for, and compare runs, as well as download run artifacts and metadata for analysis in other tools. An MLflow run corresponds to a single execution of model code.
For example, you can record images (for example, PNGs), models (for example, a pickled scikit-learn model), and data files (for example, a Parquet file) as artifacts. You can record runs using MLflow Python, R, Java, and REST APIs from anywhere you run your code.
By default, the MLflow Python API logs runs locally to files in an mlruns directory wherever you ran your program. You can then run mlflow ui to see the logged runs. To log runs remotely, set the MLFLOW_TRACKING_URI environment variable to a tracking server's URI or call mlflow. set_tracking_uri() .
With around 60K downloads per day, 8K stars on GitHub — MLflow is an open-source tool originally launched by Databricks that has gained great popularity since its launch in 2018. It helps data scientists manage multiple stages of the Machine Learning lifecycle.
Unfortunately it seems there is no way to do this via the UI or CLI at the moment :-/
The way to do it depends on the type of backend file store that you are using.
Filestore:
If you are using the filesystem as a storage mechanism (the default) then it is easy. The 'deleted' experiments are moved to a .trash
folder. You just need to clear that out:
rm -rf mlruns/.trash/*
As of the current version of the documentation (1.7.2), they remark:
It is recommended to use a cron job or an alternate workflow mechanism to clear
.trash
folder.
SQL Database:
This is more tricky, as there are dependencies that need to be deleted. I am using MySQL, and these commands work for me:
USE mlflow_db; # the name of your database
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
)
);
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage="deleted"
);
DELETE FROM experiments where lifecycle_stage="deleted";
As of mlflow 1.11.0, the recommended way to permanently delete runs within an experiment is: mlflow gc [OPTIONS]
.
From the documentation, mlflow gc
will
Permanently delete runs in the deleted lifecycle stage from the specified backend store. This command deletes all artifacts and metadata associated with the specified runs.
I am adding SQL commands if you want to delete permanently Trash of MLFlow if you are using PostgreSQL as backend storage.
Change to your MLFlow Database, e.g. by using: \c mlflow
and then:
DELETE FROM experiment_tags WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM latest_metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM metrics WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM tags WHERE run_uuid=ANY(
SELECT run_uuid FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
)
);
DELETE FROM params WHERE run_uuid=ANY(
SELECT run_uuid FROM runs where experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
));
DELETE FROM runs WHERE experiment_id=ANY(
SELECT experiment_id FROM experiments where lifecycle_stage='deleted'
);
DELETE FROM experiments where lifecycle_stage='deleted';
The difference is, that I added the 'params' Table SQL Delete command there.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With