I have some .csv files which I am parsing before storing in database.
I would like to make application more robust, and perform validation upon the .csv files before save in the database.
So I am asking you guys if you have some good links, or code examples, patterns, or advice on how to do this?
I will paste an example of my .csv file below. The different data fields in the .csv file are separated by tabs. Each new row of data is on a new line.
I have been thinking a little about the things I should validate against and came up with the list below (I am very open for other suggestions, in case you have anything which you think should be added to the list?)
Correct file encoding.
That file is not empty.
Correct number of lines/columns.
correct number/text/date formats.
correct number ranges.
This is how my .csv file looks like (file with two lines, data on one line is separated by tabs).
4523424 A123456 GT-P1000 mobile phone Samsung XSD1234 135354191325234
345353 A134211 A8181 mobile phome HTC S4112-ad3 111911911932343
The string representation of above looks like:
"4523424\tA123456\tGT-P1000\tmobile phone\tSamsung\tXSD1234\t135354191325234\r
\n345353\tA134211\tA8181\tmobile phome\tHTC\tS4112-ad3\t111911911932343\r\n"
So do you have any good design, links, patterns, code examples, etc. on how to do this in C#?
I do like this:
Create a class to hold each parsed line with expected type
internal sealed class Record {
public int Field1 { get; set; }
public DateTime Field2 { get; set; }
public decimal? PossibleEmptyField3 { get; set; }
...
}
Create a method that parses a line into the record
public Record ParseRecord(string[] fields) {
if (fields.Length < SomeLineLength)
throw new MalformadLineException(...)
var record = new Record();
record.Field1 = int.Parse(fields[0], NumberFormat.None, CultureInvoice.InvariantCulture);
record.Field2 = DateTime.ParseExact(fields[1], "yyyyMMdd", CultureInvoice.InvariantCulture);
if (fields[2] != "")
record.PossibleEmptyField3 = decimal.Parse(fields[2]...)
return record;
}
Create a method parsing the entire file
public List<Record> ParseStream(Stream stream) {
var tfp = new TextFileParser(stream);
...
try {
while (!tfp.EndOfData) {
records.Add(ParseRecord(tfp.ReadFields());
}
}
catch (FormatException ex) {
... // show error
}
catch (MalformadLineException ex) {
... // show error
}
return records;
}
And then I create a number of methods validating the fields
public void ValidateField2(IEnumerable<Record> records) {
foreach (var invalidRecord in records.Where(x => x.Field2 < DateTime.Today))
... // show error
}
I have tried various tools but since the pattern is straight forward they don't help much. (You should use a tool to split the line into fields)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With