I am trying to submit an experiment in Azure Machine Learning service locally on an Azure VM using a ScriptRunConfig
object in my workspace ws
, as in
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import RunConfiguration
from azureml.core import Experiment
experiment = Experiment(ws, name='test')
run_local = RunConfiguration()
script_params = {
'--data-folder': './data',
'--training-data': 'train.csv'
}
src = ScriptRunConfig(source_directory = './source_dir',
script = 'train.py',
run_config = run_local,
arguments = script_params)
run = experiment.submit(src)
However, this fails with
ExperimentExecutionException: { "error_details": { "correlation": { "operation": "bb12f5b8bd78084b9b34f088a1d77224", "request": "iGfp+sjC34Q=" }, "error": { "code": "UserError", "message": "Failed to deserialize run definition"
Worse, if I set my data folder to use a datastore (which likely I will need to)
script_params = {
'--data-folder': ds.path('mydatastoredir').as_mount(),
'--training-data': 'train.csv'
}
the error is
UserErrorException: Dictionary with non-native python type values are not supported in runconfigs.
{'--data-folder': $AZUREML_DATAREFERENCE_d93269a580ec4ecf97be428cd2fe79, '--training-data': 'train.csv'}
I don't quite understand how I should pass my script_params
parameters to my train.py
(the documentation of ScriptRunConfig
doesn't include a lot of details on this unfortunately).
Does anybody know how to properly create src
in these two cases?
Passing Data Between Pipeline Steps with PipelineData It's safe to say I like PipelineData. Its API is simple enough, you just create an instance with a name, and then configure your step to use it as an argument. You also have to tell the pipeline steps whether your data is an input or an output.
Create and run a Python scriptSign in to the Azure Machine Learning studio and select your workspace if prompted. On the left, select Notebooks. In the Files toolbar, select +, then select Create new folder. Name the folder get-started.
Examples of supported Azure storage services that can be registered as datastores are: Azure Blob Container. Azure File Share.
In the end I abandoned ScriptRunConfig
and used Estimator
as follows to pass script_params
(after having provisioned a compute target):
estimator = Estimator(source_directory='./mysourcedir',
script_params=script_params,
compute_target='cluster',
entry_script='train.py',
conda_packages = ["pandas"],
pip_packages = ["git+https://github.com/..."],
use_docker=True,
custom_docker_image='<mydockeraccount>/<mydockerimage>')
This also allowed me to install my pip_packages
dependency by putting on https://hub.docker.com/ a custom_docker_image
Docker image created from a Dockerfile like:
FROM continuumio/miniconda
RUN apt-get update
RUN apt-get install git gcc g++ -y
(it worked!)
The correct way of passing arguments to the ScriptRunConfig and RunConfig is as a list of strings according to https://learn.microsoft.com/nb-no/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py.
Modified and working code would be as follows.
from azureml.core import ScriptRunConfig
from azureml.core.runconfig import RunConfiguration
from azureml.core import Experiment
experiment = Experiment(ws, name='test')
run_local = RunConfiguration()
script_params = [
'--data-folder',
'./data',
'--training-data',
'train.csv'
]
src = ScriptRunConfig(source_directory = './source_dir',
script = 'train.py',
run_config = run_local,
arguments = script_params)
run = experiment.submit(src)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With