I am trying to import a file with multiple record definition in it. Each one can also have a header record so I thought I would define a definition interface like so. <pre class="prettyprint"><code>public interface IRecordDefinition<T> { bool Matches(string row); T MapRow(string row); bool AreRecordsNested { get; } GenericLoadClass ToGenericLoad(T input); } </code></pre> I then created a concrete implementation for a class. <pre class="prettyprint"><code>public class TestDefinition : IRecordDefinition<Test> { public bool Matches(string row) { return row.Split('\t')[0] == "1"; } public Test MapColumns(string[] columns) { return new Test {val = columns[0].parseDate("ddmmYYYY")}; } public bool AreRecordsNested { get { return true; } } public GenericLoadClass ToGenericLoad(Test input) { return new GenericLoadClass {Value = input.val}; } } </code></pre> However for each File Definition I need to store a list of the record definitions so I can then loop through each line in the file and process it accordingly. Firstly am I on the right track or is there a better way to do it?

I think the FileHelpers library solves a number of your problems: <ul> <li>Strong types</li> <li>Delimited</li> <li>Fixed-width</li> <li>Record-by-Record operations</li> </ul> I'm sure you could consolidate this into a type hierarchy that could tie in custom binary formats as well.

Importing data files using generic class definitions

Tags:

c#

file-io

generics

I am trying to import a file with multiple record definition in it. Each one can also have a header record so I thought I would define a definition interface like so.

public interface IRecordDefinition<T>
{
    bool Matches(string row);
    T MapRow(string row);
    bool AreRecordsNested { get; }
    GenericLoadClass ToGenericLoad(T input);
}

I then created a concrete implementation for a class.

public class TestDefinition : IRecordDefinition<Test>
{
    public bool Matches(string row)
    {
        return row.Split('\t')[0] == "1";
    }

    public Test MapColumns(string[] columns)
    {
        return new Test {val = columns[0].parseDate("ddmmYYYY")};
    }

    public bool AreRecordsNested
    {
        get { return true; }
    }

    public GenericLoadClass ToGenericLoad(Test input)
    {
        return new GenericLoadClass {Value = input.val};
    }
}

However for each File Definition I need to store a list of the record definitions so I can then loop through each line in the file and process it accordingly.

Firstly am I on the right track
or is there a better way to do it?

821

asked Feb 17 '11 01:02

Schotime

2 Answers

I would split this process into two pieces.

First, a specific process to split the file with multiple types into multiple files. If the files are fixed width, I have had a lot of luck with regular expressions. For example, assume the following is a text file with three different record types.

TE20110223 A 1
RE20110223 BB 2
CE20110223 CCC 3

You can see there is a pattern here, hopefully the person who decided to put all the record types in the same file gave you a way to identify those types. In the case above you would define three regular expressions.

string pattern1 = @"^TE(?<DATE>[0-9]{8})(?<NEXT1>.{2})(?<NEXT2>.{2})";
string pattern2 = @"^RE(?<DATE>[0-9]{8})(?<NEXT1>.{3})(?<NEXT2>.{2})";
string pattern3 = @"^CE(?<DATE>[0-9]{8})(?<NEXT1>.{4})(?<NEXT2>.{2})";

Regex Regex1 = new Regex(pattern1);
Regex Regex2 = new Regex(pattern2);
Regex Regex3 = new Regex(pattern3);

StringBuilder FirstStringBuilder = new StringBuilder();
StringBuilder SecondStringBuilder = new StringBuilder();
StringBuilder ThirdStringBuilder = new StringBuilder();

string Line = "";
Match LineMatch;


FileInfo myFile = new FileInfo("yourFile.txt");

using (StreamReader s = new StreamReader(f.FullName))
{

    while (s.Peek() != -1)
    {
        Line = s.ReadLine();

        LineMatch = Regex1.Match(Line);
        if (LineMatch.Success)
        {
            //Write this line to a new file
        }

        LineMatch = Regex2.Match(Line);
        if (LineMatch.Success)
        {
            //Write this line to a new file
        }

        LineMatch = Regex3.Match(Line);
        if (LineMatch.Success)
        {
            //Write this line to a new file
        }
    }
}

Next, take the split files and run them through a generic process, that you most likely already have, to import them. This works well because when the process inevitably fails, you can narrow it to the single record type that is failing and not impact all the record types. Archive the main text file along with the split files and your life will be much easier as well.

Dealing with these kinds of transmitted files is hard, because someone else controls them and you never know when they are going to change. Logging the original file as well as a receipt of the import is very import and shouldn't be overlooked either. You can make that as simple or as complex as you want, but I tend to write a receipt to a db and copy the primary key from that table into a foreign key in the table I have imported the data into, then never change that data. I like to keep a unmolested copy of the import on the file system as well as on the DB server because there are inevitable conversion / transformation issues that you will need to track down.

Hope this helps, because this is not a trivial task. I think you are on the right track, but instead of processing/importing each line separately...write them to a separate file. I am assuming this is financial data, which is one of the reasons I think provability at every step is important.

answered Oct 12 '22 20:10

Jeremy Gray

I think the FileHelpers library solves a number of your problems:

Strong types
Delimited
Fixed-width
Record-by-Record operations

I'm sure you could consolidate this into a type hierarchy that could tie in custom binary formats as well.

answered Oct 12 '22 20:10

Richard Nienaber

Related questions
                            
                                "Specified cast is not valid" error when saving LINQ-To-SQL entity
                            
                                using the equals keyword in linq [duplicate]
                            
                                Application running in background
                            
                                Any best practices on whether to access class instance members directly or pass state as parameters to a class method?
                            
                                Is there a way to tell how many "Concurrent Calls" are being made to a WCF service?
                            
                                C#: Wrapping methods in other methods
                            
                                How to calculate a summation of an arbitrary set of numbers, and all subsets of those numbers?
                            
                                convert image to base64 and check size
                            
                                Pinch-to-zoom on huge images?
                            
                                PInvoke, pointers and array copy
                            
                                Exception in MS Unit Test?
                            
                                Does static reference to HttpContext.Current.Session return same session for all users?
                            
                                Cannot access a disposed object
                            
                                Unit of work, rollback options
                            
                                Batch Converting PDF to XPS
                            
                                Sending a string to a server from a client in C#
                            
                                Contravariance in Action lambda - C#
                            
                                Adding assemblies to the GAC from Inno Setup
                            
                                Mass filtering with protobuf-net
                            
                                How to Read values from XML file

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With