i have an folder, in which i receive .csv files for every half hour with time stamps. Now, i need to take the latest file from the available files and import it into sql server.
For Example
in my source folder, i have
test_01112012_120122.csv
test_01112012_123022.csv
test_01112012_123555.csv
now i need to fetch the latest file and import that file into sql server with the help of SSIS.
Thanks
satish
Step 1: Create a variable VarFolderPath that will contain the folder path in which our files exist and second variable with name VarFileName which will hold the value of most recent File Name. Click on Edit Script and write below script. I have only added the code which is in Red.
Drag the “Flat File Source” from the SSIS Toolbox into the “Data Flow” window and rename it as “CSV File”. Double click on this source and select the “Student CSV File” connection manager. Click on Columns on the left side of the screen to review the columns in the file. Click OK.
The code from @garry Vass, or one like it, is going to be needed even if you're using SSIS as your import tool.
Within SSIS, you will need to update the connection string to your flat file connection manager to point to the new file. Ergo, you need to determine what is the most recent file.
Whether you do it by file attributes (Garry's code) or slicing and dicing of file names is going to be dependent upon what your business rules are. Is it always the most recently modified file (attribute) or does it need to be based off the file name being interpreted as a sequence. This matters if the test_01112012_120122.csv
had a mistake in it and the contents are updated. The modified date will change but the file name will not and those changes wouldn't get ported back into the database.
I would suggest you create 2 variables of type String and scoped to the package named RootFolder
and CurrentFile
. Optionally, you can create one called FileMask if you are restricting to a particular type like *.csv
. RootFolder
would be the base folder you expect to find files in C:\ssisdata\MyProject
. CurrentFile
will be assigned a value from a script of the fully qualified path to the most recently modified file. I find it helpful at this point to assign a design-time value to CurrentFile, usually to the oldest file in the collection.
Drag a Script Task onto the Control Flow and set as your ReadOnlyVariable User::RootFolder (optionally User::FileMask). Your ReadWriteVariable would be User::CurrentFile.
This script would go inside the public partial class ScriptMain: ...
braces
/// <summary>
/// This verbose script identifies the most recently modified file of type fileMask
/// living in RootFolder and assigns that to a DTS level variable.
/// </summary>
public void Main()
{
string fileMask = "*.csv";
string mostRecentFile = string.Empty;
string rootFolder = string.Empty;
// Assign values from the DTS variables collection.
// This is case sensitive. User:: is not required
// but you must convert it from the Object type to a strong type
rootFolder = Dts.Variables["User::RootFolder"].Value.ToString();
// Repeat the above pattern to assign a value to fileMask if you wish
// to make it a more flexible approach
// Determine the most recent file, this could be null
System.IO.FileInfo candidate = ScriptMain.GetLatestFile(rootFolder, fileMask);
if (candidate != null)
{
mostRecentFile = candidate.FullName;
}
// Push the results back onto the variable
Dts.Variables["CurrentFile"].Value = mostRecentFile;
Dts.TaskResult = (int)ScriptResults.Success;
}
/// <summary>
/// Find the most recent file matching a pattern
/// </summary>
/// <param name="directoryName">Folder to begin searching in</param>
/// <param name="fileExtension">Extension to search, e.g. *.csv</param>
/// <returns></returns>
private static System.IO.FileInfo GetLatestFile(string directoryName, string fileExtension)
{
System.IO.DirectoryInfo directoryInfo = new System.IO.DirectoryInfo(directoryName);
System.IO.FileInfo mostRecent = null;
// Change the SearchOption to AllDirectories if you need to search subfolders
System.IO.FileInfo[] legacyArray = directoryInfo.GetFiles(fileExtension, System.IO.SearchOption.TopDirectoryOnly);
foreach (System.IO.FileInfo current in legacyArray)
{
if (mostRecent == null)
{
mostRecent = current;
}
if (current.LastWriteTimeUtc >= mostRecent.LastWriteTimeUtc)
{
mostRecent = current;
}
}
return mostRecent;
// To make the below code work, you'd need to edit the properties of the project
// change the TargetFramework to probably 3.5 or 4. Not sure
// Current error is the OrderByDescending doesn't exist for 2.0 framework
//return directoryInfo.GetFiles(fileExtension)
// .OrderByDescending(q => q.LastWriteTimeUtc)
// .FirstOrDefault();
}
#region ScriptResults declaration
/// <summary>
/// This enum provides a convenient shorthand within the scope of this class for setting the
/// result of the script.
///
/// This code was generated automatically.
/// </summary>
enum ScriptResults
{
Success = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Success,
Failure = Microsoft.SqlServer.Dts.Runtime.DTSExecResult.Failure
};
#endregion
}
At this point, our script has assigned a value to the CurrentFile variable. The next step is to tell SSIS we need to use that file. In your Connection Manager for your CSV, you will need to set an Expression (F4 or right click and select Properties) for the ConnectionString. The value you want to assign is our CurrentFile variable and the way that's expressed is @[User::CurrentFile]
Finally, these screen shots are based on the upcoming release of SQL Server 2012 so the icons may appear different but the functionality remains the same.
Assuming that you wanted to use C#, to get the newest file in a given directory, you can use a method like this...
private static FileInfo GetLatestFile(string directoryName, string fileExtension)
{
DirectoryInfo directoryInfo = new DirectoryInfo(directoryName);
return directoryInfo.GetFiles(fileExtension)
.OrderByDescending(q => q.LastWriteTimeUtc)
.FirstOrDefault();
}
This method is called like...
FileInfo file = GetLatestFile( "C:\myDirectory", "*.csv");
And it returns a FileInfo instance (or null) of the file with the most recent write time. You can then use the FileInfo instance to get the name of the file and so on for your processing...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With