Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing a CSV formatted text file

Tags:

c#

parsing

I have a text file that looks like this:

1,Smith, 249.24, 6/10/2010
2,Johnson, 1332.23, 6/11/2010
3,Woods, 2214.22, 6/11/2010
1,Smith, 219.24, 6/11/2010

I need to be able to find the balance for a client on a given date.

I'm wondering if I should:

A. Start from the end and read each line into an Array, one at a time. Check the last name index to see if it is the client we're looking for. Then, display the balance index of the first match.

or

B. Use RegEx to find a match and display it.

I don't have much experience with RegEx, but I'll learn it if it's a no brainer in a situation like this.

like image 684
tpow Avatar asked Jun 20 '10 16:06

tpow


2 Answers

I would recommend using the FileHelpers opensource project: http://www.filehelpers.net/

Piece of cake:

Define your class:

[DelimitedRecord(",")]
public class Customer
{
    public int CustId;

    public string Name;

    public decimal Balance;

    [FieldConverter(ConverterKind.Date, "dd-MM-yyyy")]
    public DateTime AddedDate;

}   

Use it:

var engine = new FileHelperAsyncEngine<Customer>();

// Read
using(engine.BeginReadFile("TestIn.txt"))
{
   // The engine is IEnumerable 
   foreach(Customer cust in engine)
   {
      // your code here
      Console.WriteLine(cust.Name);

      // your condition >> add balance
   }
}
like image 81
bertelmonster2k Avatar answered Sep 23 '22 19:09

bertelmonster2k


This looks like a pretty standard CSV type layout, which is easy enough to process. You can actually do it with ADO.Net and the Jet provider, but I think it is probably easier in the long run to process it yourself.

So first off, you want to process the actual text data. I assume it is reasonable to assume each record is seperated by some newline character, so you can utilize the ReadLine method to easily get each record:

StreamReader reader = new StreamReader("C:\Path\To\file.txt")
while(true)
{
    var line = reader.ReadLine();
    if(string.IsNullOrEmpty(line))
        break;
    // Process Line
}

And then to process each line, you can split the string on comma, and store the values into a data structure. So if you use a data structure like this:

public class MyData
{
    public int Id { get; set; }
    public string Name { get; set; }
    public decimal Balance { get; set; }
    public DateTime Date { get; set; }
}

And you can process the line data with a method like this:

public MyData GetRecord(string line)
{
    var fields = line.Split(',');
    return new MyData()
    {
        Id = int.Parse(fields[0]),
        Name = fields[1],
        Balance = decimal.Parse(fields[2]),
        Date = DateTime.Parse(fields[3])
    };
}

Now, this is the simplest example, and doesn't account for cases where the fields may be empty, in which case you would either need to support NULL for those fields (using nullable types int?, decimal? and DateTime?), or define some default value that would be assigned to those values.

So once you have that you can store the collection of MyData objects in a list, and easily perform calculations based on that. So given your example of finding the balance on a given date you could do something like:

var data = customerDataList.First(d => d.Name == customerNameImLookingFor 
                                    && d.Date == dateImLookingFor);

Where customerDataList is the collection of MyData objects read from the file, customerNameImLookingFor is a variable containing the customer's name, and customerDateImLookingFor is a variable containing the date.

I've used this technique to process data in text files in the past for files ranging from a couple records, to tens of thousands of records, and it works pretty well.

like image 36
ckramer Avatar answered Sep 23 '22 19:09

ckramer