Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Updating pandas to version 0.19 in Azure ML Studio

I would really like to get access to some of the updated functions in pandas 0.19, but Azure ML studio uses pandas 0.18 as part of the Anaconda 4.0 bundle. Is there a way to update the version that is used within the "Execute Python Script" components?

like image 714
user4446237 Avatar asked Sep 14 '17 15:09

user4446237


1 Answers

I offer the below steps for you to show how to update the version of pandas library in Execute Python Script.

Step 1 : Use the virtualenv component to create an independent python runtime environment in your system.Please install it first with command pip install virtualenv if you don't have it.

If you installed it successfully ,you could see it in your python/Scripts file.

enter image description here

Step2 : Run the commad to create independent python runtime environment.

enter image description here

Step 3 : Then go into the created directory's Scripts folder and activate it (this step is important , don't miss it)

Please don't close this command window and use pip install pandas==0.19 to download external libraries in this command window.

enter image description here

Step 4 : Compress all of the files in the Lib/site-packages folder into a zip package (I'm calling it pandas - package here)

enter image description here

Step 5 :Upload the zip package into the Azure Machine Learning WorkSpace DataSet.

enter image description here

specific steps please refer to the Technical Notes.

After success, you will see the uploaded package in the DataSet List

enter image description here

Step 6 : Before the defination of method azureml_main in the Execute Python Script module, you need to remove the old pandas modules & its dependencies, then to import pandas again, as the code below.

import sys
import pandas as pd
print(pd.__version__)
del sys.modules['pandas']
del sys.modules['numpy']
del sys.modules['pytz']
del sys.modules['six']
del sys.modules['dateutil']
sys.path.insert(0, '.\\Script Bundle')
for td in [m for m in sys.modules if m.startswith('pandas.') or m.startswith('numpy.') or m.startswith('pytz.') or m.startswith('dateutil.') or m.startswith('six.')]:
    del sys.modules[td]
import pandas as pd
print(pd.__version__)
# The entry point function can contain up to two input arguments:
#   Param<dataframe1>: a pandas.DataFrame
#   Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):

Then you can see the result from logs as below, first print the old version 0.14.0, then print the new version 0.19.0 from the uploaded zip file.

[Information]         0.14.0
[Information]         0.19.0

You could also refer to these threads: Access blob file using time stamp in Azure and reload with reset.

Hope it helps you.

like image 191
Jay Gong Avatar answered Sep 29 '22 15:09

Jay Gong