Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

SSIS: Execute Ironpython or Ironruby scripts through SSIS

Problem!

I have a little python script, which goes throught a web page (http-crawling). This web-page is hosted inside the intranet and uses NTLM authentication to gather access to it.

So, I found this task (retrieve http-content) easily programmable using python, instead of trying to re-write the whole python script to C# and then use it througth "Script Task" on SSIS, in order to complete the task.

Hint!

I've looked up closely to SSIS tools and I found that there is a Control Flow named "Execute Process Task", which lets you to execute Win32 executables.

But the problem resides in how to call my python script since it's not executable and needs to be interpreted by the python interpreter (if you'll forgive the repetition). So, I could easily end up building a simple ".bat" file that calls both the python script and the interpreter. And then execute that file through SSIS "Execute Process Task".

Question!

Is there any other way to implement this? (neat way)

Edit #1

Usage

The information retrieved from the script will be storing that information into a table from a database, So that information will be accessed trough the database table from another SSIS process.

I'm retrieving the information from different sources (flat files, database tables, http request, ...) in order to archive that information into a database that could be posted in a web services and then accessed from a Excel project.

Thanks in Advance!

like image 351
Eder Avatar asked Apr 24 '12 19:04

Eder


1 Answers

The easiest, at least to my brain, mechanism for using IronPython from the confines of SSIS would be to invoke the external process and dump to a file and then use that as a source for a dataflow.

That said, I was able to host an IronPython app from C# and use the returned data to populate the output buffers and interact with that data in the pipeline. I've only had one machine to perform this on so I'm listing everything I recall doing until the package went green.

Prerequisites

This article set me down the path of how to make this work. Hosting IronPython in a C# 4.0 program I would strongly urge you to create a C#/VB.NET console app and get your IronPython integration working there first as SSIS is going to add an additional layer to everything.

There may be the ability to host older versions of IronPython within C# without requiring the 4.0 framework but that's far beyond the realm of my competency. What I can say is that to use the 4.0 framework, you are looking at SQL Server 2012. A 2008 package can target up to the 3.5 framework (default is 2.0).

Global Assembly Cache, GAC for short. It is a special place in Windows where signed assemblies can live. SSIS may be able to use assemblies that aren't in the GAC, but I've not had luck doing so. This case was no different. My Console app worked fine but when I copied that code into SSIS, it'd tank with Could not load file or assembly 'Microsoft.Scripting... error messages. Blessedly, IronPython-2.7.2.1 (and probably previous versions) are strongly signed dlls. That means you can and must add them into the GAC.

In your Visual Studio directory, look for the Visual Studio Command Prompt (2010). Assuming your IronPython installation folder is C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1 you would type cd C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1 Then I registered the following 3 assemblies

C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1>gacutil -if Microsoft.Dynamic.dll
Microsoft (R) .NET Global Assembly Cache Utility.  Version 4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Assembly successfully added to the cache

C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1>gacutil -if IronPython.dll
Microsoft (R) .NET Global Assembly Cache Utility.  Version 4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Assembly successfully added to the cache

C:\tmp\IronPython-2.7.2.1\IronPython-2.7.2.1>gacutil -if Microsoft.Scripting.dll
Microsoft (R) .NET Global Assembly Cache Utility.  Version 4.0.30319.1
Copyright (c) Microsoft Corporation.  All rights reserved.

Assembly successfully added to the cache

My SSIS project, I had set the Run64bitRuntime to False but in retesting, it does not matter. The default it True and that seems to work fine.

Python script - I don't have enough of a background to make the integration between C# and .NET DLR languages more graceful. It'd have been nice to supply a string or something containing the script I wanted to execute and perhaps that's what a script block is about but I don't have time to investigate. So, this solution requires a script file sitting out somewhere on disk. I had trouble with the imports working from a hosted script (no module named X exceptions). Undoubtedly there's some magic with class paths and all that stuff that needs to provided to the host to make it work well. That's probably a different SO question btw.

Set up

I have a file sitting at C:\ssisdata\simplePy.py

# could not get a simple import to work from hosted
# works fine from "not hosted"
#import os

def GetIPData():
    #os.listdir(r'C:\\')
    return range(0,100)

After adding a script task to the Data Flow, I configured it to have a single column on the output buffer (wstr 1000). I then used this as my source code.

using System;
using System.Collections.Generic;
using System.Data;
using Microsoft.SqlServer.Dts.Pipeline.Wrapper;
using Microsoft.SqlServer.Dts.Runtime.Wrapper;
using IronPython.Hosting;
using Microsoft.Scripting.Hosting;

/// <summary>
/// Attempt to use IP script as a source
/// http://blogs.msdn.com/b/charlie/archive/2009/10/25/hosting-ironpython-in-a-c-4-0-program.aspx
/// </summary>
[Microsoft.SqlServer.Dts.Pipeline.SSISScriptComponentEntryPointAttribute]
public class ScriptMain : UserComponent
{

    /// <summary>
    /// Create data rows and fill those buckets
    /// </summary>
    public override void CreateNewOutputRows()
    {
        foreach (var item in this.GetData())
        {
            Output0Buffer.AddRow();
            Output0Buffer.Content = item;
        }

    }

    /// <summary>
    /// I've written plenty of code, but I'm quite certain this is some of the ugliest.
    /// There certainly must be more graceful means of 
    /// * feeding your source code to the ironpython run-time than a file
    /// * processing the output of the code the method call
    /// * sucking less at life
    /// </summary>
    /// <returns>A list of strings</returns>
    public List<string> GetData()
    {
        List<string> output = null;
        var ipy = Python.CreateRuntime();
        dynamic test = ipy.UseFile(@"C:\ssisdata\simplePy.py");
        output = new List<string>();
        var pythonData = test.GetIPData();
        foreach (var item in pythonData)
        {
            output.Add(item.ToString());
        }

        return output;
    }
}

Quick shot of what my references look like

References

Click the run button and great success

Data flow

like image 101
billinkc Avatar answered Nov 15 '22 06:11

billinkc