Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Import most recent csv file to sql server in ssis

Tags:

ssis

i have an folder, in which i receive .csv files for every half hour with time stamps. Now, i need to take the latest file from the available files and import it into sql server.

For Example

in my source folder, i have

test_01112012_120122.csv
test_01112012_123022.csv
test_01112012_123555.csv

now i need to fetch the latest file and import that file into sql server with the help of SSIS.

Thanks
satish

like image 852
0537 Avatar asked Jan 12 '12 06:01

0537


People also ask

How do I load most recent file from a folder to a table in SSIS package?

Step 1: Create a variable VarFolderPath that will contain the folder path in which our files exist and second variable with name VarFileName which will hold the value of most recent File Name. Click on Edit Script and write below script. I have only added the code which is in Red.

How do I import CSV data into SSIS?

Drag the “Flat File Source” from the SSIS Toolbox into the “Data Flow” window and rename it as “CSV File”. Double click on this source and select the “Student CSV File” connection manager. Click on Columns on the left side of the screen to review the columns in the file. Click OK.


2 Answers

The code from @garry Vass, or one like it, is going to be needed even if you're using SSIS as your import tool.

Within SSIS, you will need to update the connection string to your flat file connection manager to point to the new file. Ergo, you need to determine what is the most recent file.

Finding the most recent file

Whether you do it by file attributes (Garry's code) or slicing and dicing of file names is going to be dependent upon what your business rules are. Is it always the most recently modified file (attribute) or does it need to be based off the file name being interpreted as a sequence. This matters if the test_01112012_120122.csv had a mistake in it and the contents are updated. The modified date will change but the file name will not and those changes wouldn't get ported back into the database.

I would suggest you create 2 variables of type String and scoped to the package named RootFolder and CurrentFile. Optionally, you can create one called FileMask if you are restricting to a particular type like *.csv. RootFolder would be the base folder you expect to find files in C:\ssisdata\MyProject. CurrentFile will be assigned a value from a script of the fully qualified path to the most recently modified file. I find it helpful at this point to assign a design-time value to CurrentFile, usually to the oldest file in the collection.

Drag a Script Task onto the Control Flow and set as your ReadOnlyVariable User::RootFolder (optionally User::FileMask). Your ReadWriteVariable would be User::CurrentFile. Edit Script

This script would go inside the public partial class ScriptMain: ... braces

    /// <summary>
    /// This verbose script identifies the most recently modified file of type fileMask
    /// living in RootFolder and assigns that to a DTS level variable.
    /// </summary>
    public void Main()
    {
        string fileMask = "*.csv";
        string mostRecentFile = string.Empty;
        string rootFolder = string.Empty;

        // Assign values from the DTS variables collection.
        // This is case sensitive. User:: is not required
        // but you must convert it from the Object type to a strong type
        rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();

        // Repeat the above pattern to assign a value to fileMask if you wish
        // to make it a more flexible approach

        // Determine the most recent file, this could be null
        System.IO.FileInfo candidate = ScriptMain.GetLatestFile(rootFolder, fileMask);

        if (candidate != null)
        {
            mostRecentFile = candidate.FullName;
        }

        // Push the results back onto the variable
        Dts.Variables["CurrentFile"].Value = mostRecentFile;

        Dts.TaskResult = (int)ScriptResults.Success;
    }

    /// <summary>
    /// Find the most recent file matching a pattern
    /// </summary>
    /// <param name="directoryName">Folder to begin searching in</param>
    /// <param name="fileExtension">Extension to search, e.g. *.csv</param>
    /// <returns></returns>
    private static System.IO.FileInfo GetLatestFile(string directoryName, string fileExtension)
    {
        System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(directoryName);

        System.IO.FileInfo mostRecent = null;

        // Change the SearchOption to AllDirectories if you need to search subfolders
        System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileExtension, System.IO.SearchOption.TopDirectoryOnly);
        foreach (System.IO.FileInfo current in legacyArray)
        {
            if (mostRecent == null)
            {
                mostRecent = current;
            }

            if (current.LastWriteTimeUtc >= mostRecent.LastWriteTimeUtc)
            {
                mostRecent = current;
            }
        }

        return mostRecent;

        // To make the below code work, you'd need to edit the properties of the project
        // change the TargetFramework to probably 3.5 or 4. Not sure
        // Current error is the OrderByDescending doesn't exist for 2.0 framework
        //return directoryInfo.GetFiles(fileExtension)
        //     .OrderByDescending(q => q.LastWriteTimeUtc)
        //     .FirstOrDefault();
    }

    #region ScriptResults declaration
    /// <summary>
    /// This enum provides a convenient shorthand within the scope of this class for setting the
    /// result of the script.
    /// 
    /// This code was generated automatically.
    /// </summary>
    enum ScriptResults
    {
        Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
        Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
    };
    #endregion

}

Updating a Connection Manager

At this point, our script has assigned a value to the CurrentFile variable. The next step is to tell SSIS we need to use that file. In your Connection Manager for your CSV, you will need to set an Expression (F4 or right click and select Properties) for the ConnectionString. The value you want to assign is our CurrentFile variable and the way that's expressed is @[User::CurrentFile]

Assign connection string

Finally, these screen shots are based on the upcoming release of SQL Server 2012 so the icons may appear different but the functionality remains the same.

like image 191
billinkc Avatar answered Nov 07 '22 17:11

billinkc


Assuming that you wanted to use C#, to get the newest file in a given directory, you can use a method like this...

private static FileInfo GetLatestFile(string directoryName, string fileExtension)
{
    DirectoryInfo directoryInfo = new DirectoryInfo(directoryName);
    return directoryInfo.GetFiles(fileExtension)
         .OrderByDescending(q => q.LastWriteTimeUtc)
         .FirstOrDefault();
}

This method is called like...

FileInfo file = GetLatestFile( "C:\myDirectory", "*.csv");

And it returns a FileInfo instance (or null) of the file with the most recent write time. You can then use the FileInfo instance to get the name of the file and so on for your processing...

like image 41
Gayot Fow Avatar answered Nov 07 '22 17:11

Gayot Fow