Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to connect AMLS to ADLS Gen 2?

I would like to register a dataset from ADLS Gen2 in my Azure Machine Learning workspace (azureml-core==1.12.0). Given that service principal information is not required in the Python SDK documentation for .register_azure_data_lake_gen2(), I successfully used the following code to register ADLS gen2 as a datastore:

from azureml.core import Datastore

adlsgen2_datastore_name = os.environ['adlsgen2_datastore_name']
account_name=os.environ['account_name'] # ADLS Gen2 account name
file_system=os.environ['filesystem']

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(
    workspace=ws,
    datastore_name=adlsgen2_datastore_name,
    account_name=account_name, 
    filesystem=file_system
)

However, when I try to register a dataset, using

from azureml.core import Dataset
adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
data = Dataset.Tabular.from_delimited_files((adls_ds, 'folder/data.csv'))

I get an error

Cannot load any data from the specified path. Make sure the path is accessible and contains data. ScriptExecutionException was caused by StreamAccessException. StreamAccessException was caused by AuthenticationException. 'AdlsGen2-ReadHeaders' for '[REDACTED]' on storage failed with status code 'Forbidden' (This request is not authorized to perform this operation using this permission.), client request ID <CLIENT_REQUEST_ID>, request ID <REQUEST_ID>. Error message: [REDACTED] | session_id=<SESSION_ID>

Do I need the to enable the service principal to get this to work? Using the ML Studio UI, it appears that the service principal is required even to register the datastore.

Another issue I noticed is that AMLS is trying to access the dataset here: https://adls_gen2_account_name.**dfs**.core.windows.net/container/folder/data.csv whereas the actual URI in ADLS Gen2 is: https://adls_gen2_account_name.**blob**.core.windows.net/container/folder/data.csv

like image 210
bab689 Avatar asked Sep 14 '20 20:09

bab689


People also ask

What are the two levels of security in Adls Gen2?

For access control, ADLS has two layers: roles and access control lists, or ACLs. Here are the three primary roles. An owner of a data lake has full control over everything, including permissions for other users. A contributor can read, write, and delete data, but it can't change the permissions of other users.


1 Answers

According to this documentation,you need to enable the service principal.

1.you need to register your application and grant the service principal with Storage Blob Data Reader access.

enter image description here

2.try this code:

adlsgen2_datastore = Datastore.register_azure_data_lake_gen2(workspace=ws,
                                                             datastore_name=adlsgen2_datastore_name,
                                                             account_name=account_name,
                                                             filesystem=file_system,
                                                             tenant_id=tenant_id,
                                                             client_id=client_id,
                                                             client_secret=client_secret
                                                             )

adls_ds = Datastore.get(ws, datastore_name=adlsgen2_datastore_name)
dataset = Dataset.Tabular.from_delimited_files((adls_ds,'sample.csv'))
print(dataset.to_pandas_dataframe())

Result:

enter image description here

like image 54
Steve Zhao Avatar answered Sep 16 '22 21:09

Steve Zhao