Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to read from a Text File Faster/Smarter?

Tags:

c#

.net

linq

c#-4.0

I want to know if it is possible to read from a text file in a faster and smarter way.

This is a typical format of my data in a text file:

Call this "part":

ID:1;
FIELD1 :someText;
FIELD2 :someText;
FIELD3 :someText;
FIELD4 :someText;
FIELD5 :someText;
FIELD6 :someText;
FIELD7 :someText;
FIELD8 :someText;
END_ID :
01: someData;
02: someData;
...
...
48: someData;
ENDCARD:

I have thousands of them in a text file.

Is it possible to use LINQ to read it "part" by "part"? I don't want to loop through every single line.

Will it be possible for LINQ to start at ID:1; and end at ENDCARD:?

The reason for this is that i want to create a object for every "part"...

I had something like this in mind:

string[] lines = System.IO.File.ReadAllLines(SomeFilePath);

//Cleaning up the text file of unwanted text
var cleanedUpLines = from line in lines
                     where !line.StartsWith("FIELD1")
                     && !line.StartsWith("FIELD5")
                     && !line.StartsWith("FIELD8")
                     select line.Split(':');

//Here i want to LINQtoText "part" by "part"

//This i do not want to do!!!
foreach (string[] line in cleanedUpLines)
{
}
like image 528
Willem Avatar asked Jan 18 '12 08:01

Willem


1 Answers

Here you go:

static void Main()
{
    foreach(var part in ReadParts("Raw.txt"))
    {   // all the fields for the part are available; I'm just showing
        // one of them for illustration
        Console.WriteLine(part["ID"]);
    }
}

static IEnumerable<IDictionary<string,string>> ReadParts(string path)
{
    using(var reader = File.OpenText(path))
    {
        var current = new Dictionary<string, string>();
        string line;
        while((line = reader.ReadLine()) != null)
        {
            if(string.IsNullOrWhiteSpace(line)) continue;
            if(line.StartsWith("ENDCARD:"))
            {
                yield return current;
                current = new Dictionary<string, string>();
            } else
            {
                var parts = line.Split(':');
                current[parts[0].Trim()] = parts[1].Trim().TrimEnd(';');
            }
        }
        if (current.Count > 0) yield return current;
    }
}

What this does is: create an iterator block (a state machine that reads and "yields" data as it is iterated; it does not read the entire file in one go) that scans the lines; if it is the end of a card, the card is "yielded"; otherwise it adds the data into a dictionary for storage.

Note: if you have your own class that represents the data, then you could use something like reflection or FastMember to set the values by name.

This does not use LINQ directly; however, it is implemented as an enumerable sequence, which is the building block of LINQ-to-Objects, so you could consume this with LINQ, i.e.

var data = ReadParts("some.file").Skip(2).First(x => x["ID"] == "123");
like image 101
Marc Gravell Avatar answered Oct 15 '22 00:10

Marc Gravell