Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing data files using generic class definitions

I am trying to import a file with multiple record definition in it. Each one can also have a header record so I thought I would define a definition interface like so.

public interface IRecordDefinition<T>
{
    bool Matches(string row);
    T MapRow(string row);
    bool AreRecordsNested { get; }
    GenericLoadClass ToGenericLoad(T input);
}

I then created a concrete implementation for a class.

public class TestDefinition : IRecordDefinition<Test>
{
    public bool Matches(string row)
    {
        return row.Split('\t')[0] == "1";
    }

    public Test MapColumns(string[] columns)
    {
        return new Test {val = columns[0].parseDate("ddmmYYYY")};
    }

    public bool AreRecordsNested
    {
        get { return true; }
    }

    public GenericLoadClass ToGenericLoad(Test input)
    {
        return new GenericLoadClass {Value = input.val};
    }
}

However for each File Definition I need to store a list of the record definitions so I can then loop through each line in the file and process it accordingly.

Firstly am I on the right track
or is there a better way to do it?

like image 821
Schotime Avatar asked Feb 17 '11 01:02

Schotime


People also ask

What are generic classes in C#?

Generic is a class which allows the user to define classes and methods with the placeholder. Generics were added to version 2.0 of the C# language. The basic idea behind using Generic is to allow type (Integer, String, … etc and user-defined types) to be a parameter to methods, classes, and interfaces.

What is generic class in VB net?

A generic type is a single programming element that adapts to perform the same functionality for a variety of data types. When you define a generic class or procedure, you do not have to define a separate version for each data type for which you might want to perform that functionality.

What is a generic interface?

A generic interface is primarily a normal interface like any other. It can be used to declare a variable but assigned the appropriate class. It can be returned from a method. It can be passed as argument. You pass a generic interface primarily the same way you would an interface.


2 Answers

I would split this process into two pieces.

First, a specific process to split the file with multiple types into multiple files. If the files are fixed width, I have had a lot of luck with regular expressions. For example, assume the following is a text file with three different record types.

TE20110223 A 1
RE20110223 BB 2
CE20110223 CCC 3

You can see there is a pattern here, hopefully the person who decided to put all the record types in the same file gave you a way to identify those types. In the case above you would define three regular expressions.

string pattern1 = @"^TE(?<DATE>[0-9]{8})(?<NEXT1>.{2})(?<NEXT2>.{2})";
string pattern2 = @"^RE(?<DATE>[0-9]{8})(?<NEXT1>.{3})(?<NEXT2>.{2})";
string pattern3 = @"^CE(?<DATE>[0-9]{8})(?<NEXT1>.{4})(?<NEXT2>.{2})";

Regex Regex1 = new Regex(pattern1);
Regex Regex2 = new Regex(pattern2);
Regex Regex3 = new Regex(pattern3);

StringBuilder FirstStringBuilder = new StringBuilder();
StringBuilder SecondStringBuilder = new StringBuilder();
StringBuilder ThirdStringBuilder = new StringBuilder();

string Line = "";
Match LineMatch;


FileInfo myFile = new FileInfo("yourFile.txt");

using (StreamReader s = new StreamReader(f.FullName))
{

    while (s.Peek() != -1)
    {
        Line = s.ReadLine();

        LineMatch = Regex1.Match(Line);
        if (LineMatch.Success)
        {
            //Write this line to a new file
        }

        LineMatch = Regex2.Match(Line);
        if (LineMatch.Success)
        {
            //Write this line to a new file
        }

        LineMatch = Regex3.Match(Line);
        if (LineMatch.Success)
        {
            //Write this line to a new file
        }
    }
}

Next, take the split files and run them through a generic process, that you most likely already have, to import them. This works well because when the process inevitably fails, you can narrow it to the single record type that is failing and not impact all the record types. Archive the main text file along with the split files and your life will be much easier as well.

Dealing with these kinds of transmitted files is hard, because someone else controls them and you never know when they are going to change. Logging the original file as well as a receipt of the import is very import and shouldn't be overlooked either. You can make that as simple or as complex as you want, but I tend to write a receipt to a db and copy the primary key from that table into a foreign key in the table I have imported the data into, then never change that data. I like to keep a unmolested copy of the import on the file system as well as on the DB server because there are inevitable conversion / transformation issues that you will need to track down.

Hope this helps, because this is not a trivial task. I think you are on the right track, but instead of processing/importing each line separately...write them to a separate file. I am assuming this is financial data, which is one of the reasons I think provability at every step is important.

like image 83
Jeremy Gray Avatar answered Oct 12 '22 20:10

Jeremy Gray


I think the FileHelpers library solves a number of your problems:

  • Strong types
  • Delimited
  • Fixed-width
  • Record-by-Record operations

I'm sure you could consolidate this into a type hierarchy that could tie in custom binary formats as well.

like image 31
Richard Nienaber Avatar answered Oct 12 '22 20:10

Richard Nienaber