Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Parsing variable length strings of fixed column widths C#

I am trying to parse a text report that is formatted into columns. Each column appears to be right justified with a fixed length. For each line, there are times that not all the columns are used. In that case it appears that spaces are used to justify each column in the line. Example input:

031   91    1221,154
043   66     312,222    1      3,047                       3,047    1.5%    .9%
040  118     529,626    1      1,842                       1,842     .8%    .3%
037   45     427,710
019   80     512,153    1     14,685                      14,685    1.2%   2.8%
009   68     520,301                      1    16,085     16,085    1.4%   3.0%
030   13     106,689                      1     1,581      1,581    7.6%   1.4%
008   54     377,593    1      7,098                       7,098    1.8%   1.8%
018   24     171,264
022   25       8,884    1        433                         433    4.0%   4.8%
035    9      42,043
041   13     112,355

The column widths appear to be as follows (in character counts including white spaces): 3,5,12,6,10,7,10,11,8,7.

What is a good way to parse this? I have tried using a regular expression to do it, but it obviously fails on the first line being read in because I am using an expression that expects the whole line to have data:

string pattern = @"^(?.{3})(?.{5})(?.{12})(?thirtyeightyninenumber>.{6})(?{10})(?.{7}(?.{10})(?.{11})(?.{8})(?.{7})";

Looking for a good way to read this into appropriate variables depending on whether that column has data or not. I feel like I need to throw a bunch of if checks in, but am hoping there is a better way I am not thinking of.

Thanks for any help.

BTW - I am reading the lines using a StreamReader and ReadLine.

like image 915
Shawn Avatar asked Sep 20 '12 14:09

Shawn


2 Answers

There is a TextFieldParser available that is specifically meant for reading fixed-width/delimited text files like this.

It's in the Microsoft.VisualBasic.FileIO namespace but you should can still call it from C#.

Add a reference to Microsoft.VisualBasic, a using Microsoft.VisualBasic.FileIO;, then the code looks like this:

TextFieldParser parser = new TextFieldParser(stream);
parser.TextFieldType = FieldType.FixedWidth;
parser.SetFieldWidths(3, 5, 12, 6, 10, 7, 10, 11, 8, 7);
while (!parser.EndOfData)
{
    //Processing row
    string[] fields = parser.ReadFields();

    // Treat each field appropriately e.g. int.TryParse,
    // remove the "%" then float.TryParse etc.
}
parser.Close();

Edit: That said, looking in Reflector, I think this fails if your shortened lines don't have a full width worth of spaces. I'm not sure how to suggest you fix this; you could pre-process your stream to insert any missing spaces per line?

like image 98
Rawling Avatar answered Nov 19 '22 20:11

Rawling


Don't use regular expressions for this. You know the number of columns and the widths of those columns, so just use String.Substring and String.Trim:

string field1 = line.Substring(0, 5).Trim();
string field2 = line.Substring(5, 3).Trim();
string field3 = line.Substring(12, 8).Trim();
/* etc, etc */
like image 6
Sean Bright Avatar answered Nov 19 '22 18:11

Sean Bright