Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to pass parameters to a training script in Azure Machine Learning service?

I am trying to submit an experiment in Azure Machine Learning service locally on an Azure VM using a ScriptRunConfig object in my workspace ws, as in

from azureml.core import ScriptRunConfig    
from azureml.core.runconfig import RunConfiguration
from azureml.core import Experiment

experiment = Experiment(ws, name='test')
run_local = RunConfiguration()

script_params = {
    '--data-folder': './data',
    '--training-data': 'train.csv'
}

src = ScriptRunConfig(source_directory = './source_dir', 
                      script = 'train.py', 
                      run_config = run_local, 
                      arguments = script_params)

run = experiment.submit(src)

However, this fails with

ExperimentExecutionException: { "error_details": { "correlation": { "operation": "bb12f5b8bd78084b9b34f088a1d77224", "request": "iGfp+sjC34Q=" }, "error": { "code": "UserError", "message": "Failed to deserialize run definition"

Worse, if I set my data folder to use a datastore (which likely I will need to)

script_params = {
    '--data-folder': ds.path('mydatastoredir').as_mount(),
    '--training-data': 'train.csv'
}

the error is

UserErrorException: Dictionary with non-native python type values are not supported in runconfigs.
{'--data-folder': $AZUREML_DATAREFERENCE_d93269a580ec4ecf97be428cd2fe79, '--training-data': 'train.csv'}

I don't quite understand how I should pass my script_params parameters to my train.py (the documentation of ScriptRunConfig doesn't include a lot of details on this unfortunately).

Does anybody know how to properly create src in these two cases?

like image 552
Davide Fiocco Avatar asked Apr 06 '19 22:04

Davide Fiocco


People also ask

What should you use to pass data between steps in an Azure ML pipeline?

Passing Data Between Pipeline Steps with PipelineData It's safe to say I like PipelineData. Its API is simple enough, you just create an instance with a name, and then configure your step to use it as an argument. You also have to tell the pipeline steps whether your data is an input or an output.

How do I run a Python script in Azure machine learning Studio?

Create and run a Python scriptSign in to the Azure Machine Learning studio and select your workspace if prompted. On the left, select Notebooks. In the Files toolbar, select +, then select Create new folder. Name the folder get-started.

Which two types of datastores can you connect directly to an Azure machine learning Service?

Examples of supported Azure storage services that can be registered as datastores are: Azure Blob Container. Azure File Share.


2 Answers

In the end I abandoned ScriptRunConfig and used Estimator as follows to pass script_params (after having provisioned a compute target):

estimator = Estimator(source_directory='./mysourcedir',
                      script_params=script_params,
                      compute_target='cluster',
                      entry_script='train.py',
                      conda_packages = ["pandas"],
                      pip_packages = ["git+https://github.com/..."], 
                      use_docker=True,
                      custom_docker_image='<mydockeraccount>/<mydockerimage>')

This also allowed me to install my pip_packages dependency by putting on https://hub.docker.com/ a custom_docker_image Docker image created from a Dockerfile like:

FROM continuumio/miniconda
RUN apt-get update
RUN apt-get install git gcc g++ -y

(it worked!)

like image 64
Davide Fiocco Avatar answered Oct 12 '22 00:10

Davide Fiocco


The correct way of passing arguments to the ScriptRunConfig and RunConfig is as a list of strings according to https://learn.microsoft.com/nb-no/python/api/azureml-core/azureml.core.runconfiguration?view=azure-ml-py.

Modified and working code would be as follows.

from azureml.core import ScriptRunConfig    
from azureml.core.runconfig import RunConfiguration
from azureml.core import Experiment

experiment = Experiment(ws, name='test')
run_local = RunConfiguration()

script_params = [
    '--data-folder',
    './data',
    '--training-data',
    'train.csv'
]

src = ScriptRunConfig(source_directory = './source_dir', 
                      script = 'train.py', 
                      run_config = run_local, 
                      arguments = script_params)

run = experiment.submit(src)
like image 44
Ole-Henrik Borlaug Avatar answered Oct 11 '22 23:10

Ole-Henrik Borlaug