I have a process whereby we have written a class to import a large (ish) CSV into our app using CsvHelper (https://joshclose.github.io/CsvHelper).
I would like to compare the header to the Map to ensure the header's integrity. We get the CSV file from a 3rd party and I want to ensure it doesn't change over time and thought the best way to do this would be to compare it against the map.
We have a class set up as so (trimmed):
public class VisitExport
{
public int? Count { get; set; }
public string CustomerName { get; set; }
public string CustomerAddress { get; set; }
}
And its corresponding map (also trimmed):
public class VisitMap : ClassMap<VisitExport>
{
public VisitMap()
{
Map(m => m.Count).Name("Count");
Map(m => m.CustomerName).Name("Customer Name");
Map(m => m.CustomerAddress).Name("Customer Address");
}
}
This is the code I have for reading the CSV file and it works great. I have a try catch in place for the error but ideally, if it fails specifically for a header miss match, I'd like to handle that specifically.
private void fileLoadedLink_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e)
{
try
{
var filePath = string.Empty;
data = new List<VisitExport>();
using (OpenFileDialog openFileDialog = new OpenFileDialog())
{
openFileDialog.InitialDirectory = new KnownFolder(KnownFolderType.Downloads).Path;
openFileDialog.Filter = "csv files (*.csv)|*.csv";
openFileDialog.FilterIndex = 2;
openFileDialog.RestoreDirectory = true;
if (openFileDialog.ShowDialog() == DialogResult.OK)
{
filePath = openFileDialog.FileName;
var fileStream = openFileDialog.OpenFile();
var culture = CultureInfo.GetCultureInfo("en-GB");
using (StreamReader reader = new StreamReader(fileStream))
using (var readCsv = new CsvReader(reader, culture))
{
var map = new VisitMap();
readCsv.Context.RegisterClassMap(map);
var fileContent = readCsv.GetRecords<VisitExport>();
data = fileContent.ToList();
fileLoadedLink.Text = filePath;
viewModel.IsFileLoaded = true;
}
}
}
}
catch (CsvHelperException ex)
{
Console.WriteLine(ex.InnerException != null ? ex.InnerException.Message : ex.Message);
fileLoadedLink.Text = "Error loading file.";
viewModel.IsFileLoaded = false;
}
}
Is there a way of comparing the Csv header vs my map?
There are two basic cases for CSV files with headers: missing CSV columns, and extra CSV columns. The first is already detected by CsvHelper while the detection of the second is not implemented out of the box and requires subclassing of CsvReader.
(As CsvHelper maps CSV columns to model properties by name, permuting the order of the columns in the CSV file would not be considered a breaking change.)
Note that this only applies to CSV files that actually contain headers. Since you are not setting CsvConfiguration.HasHeaderRecord = false I assume that this applies to your use case.
Details about each of the two cases follow.
Missing CSV columns.
Currently CsvHelper already throws an exception by default in such situations. When unmapped data model properties are found, CsvConfiguration.HeaderValidated is invoked. By default this is set to ConfigurationFunctions.HeaderValidated whose current behavior is to throw a HeaderValidationException if there are any unmapped model properties. You can replace or extend HeaderValidated with logic of your own if you prefer:
var culture = CultureInfo.GetCultureInfo("en-GB");
var config = new CsvConfiguration (culture)
{
HeaderValidated = (args) =>
{
// Add additional logic as required here
ConfigurationFunctions.HeaderValidated(args);
},
};
using (var readCsv = new CsvReader(reader, config))
{
// Remainder unchanged
Demo fiddle #1 here.
Extra CSV columns.
Currently CsvHelper does not inform the application when this happens. See Throw if csv contains unexpected columns #1032 which confirms that this is not implemented out of the box.
In a GitHub comment, user leopignataro suggests a workaround, which is to subclass CsvReader and add the necessary validation logic oneself. However the version shown in the comment doesn't seem to handle duplicated column names or embedded references. The following subclass of CsvHelper should do this correctly. It is based on the logic in CsvReader.ValidateHeader(ClassMap map, List<InvalidHeader> invalidHeaders). It recursively walks the incoming ClassMap, attempts to find a CSV header corresponding to each member or constructor parameter, and flags the index of each one that is mapped. Afterwards, if there are any unmapped headers, the supplied Action<CsvContext, List<string>> OnUnmappedCsvHeaders is invoked to notify the application of the problem and throw some exception if desired:
public class ValidatingCsvReader : CsvReader
{
public ValidatingCsvReader(TextReader reader, CultureInfo culture, bool leaveOpen = false) : this(new CsvParser(reader, culture, leaveOpen)) { }
public ValidatingCsvReader(TextReader reader, CsvConfiguration configuration) : this(new CsvParser(reader, configuration)) { }
public ValidatingCsvReader(IParser parser) : base(parser) { }
public Action<CsvContext, List<string>> OnUnmappedCsvHeaders { get; set; }
public override void ValidateHeader(Type type)
{
base.ValidateHeader(type);
var headerRecord = HeaderRecord;
var mapped = new BitArray(headerRecord.Length);
var map = Context.Maps[type];
FlagMappedHeaders(map, mapped);
var unmappedHeaders = Enumerable.Range(0, headerRecord.Length).Where(i => !mapped[i]).Select(i => headerRecord[i]).ToList();
if (unmappedHeaders.Count > 0)
{
OnUnmappedCsvHeaders?.Invoke(Context, unmappedHeaders);
}
}
protected virtual void FlagMappedHeaders(ClassMap map, BitArray mapped)
{
// Logic adapted from https://github.com/JoshClose/CsvHelper/blob/0d753ff09294b425e4bc5ab346145702eeeb1b6f/src/CsvHelper/CsvReader.cs#L157
// By https://github.com/JoshClose
foreach (var parameter in map.ParameterMaps)
{
if (parameter.Data.Ignore)
continue;
if (parameter.Data.IsConstantSet)
// If ConvertUsing and Constant don't require a header.
continue;
if (parameter.Data.IsIndexSet && !parameter.Data.IsNameSet)
// If there is only an index set, we don't want to validate the header name.
continue;
if (parameter.ConstructorTypeMap != null)
{
FlagMappedHeaders(parameter.ConstructorTypeMap, mapped);
}
else if (parameter.ReferenceMap != null)
{
FlagMappedHeaders(parameter.ReferenceMap.Data.Mapping, mapped);
}
else
{
var index = GetFieldIndex(parameter.Data.Names.ToArray(), parameter.Data.NameIndex, true);
if (index >= 0)
mapped.Set(index, true);
}
}
foreach (var memberMap in map.MemberMaps)
{
if (memberMap.Data.Ignore || !CanRead(memberMap))
continue;
if (memberMap.Data.ReadingConvertExpression != null || memberMap.Data.IsConstantSet)
// If ConvertUsing and Constant don't require a header.
continue;
if (memberMap.Data.IsIndexSet && !memberMap.Data.IsNameSet)
// If there is only an index set, we don't want to validate the header name.
continue;
var index = GetFieldIndex(memberMap.Data.Names.ToArray(), memberMap.Data.NameIndex, true);
if (index >= 0)
mapped.Set(index, true);
}
foreach (var referenceMap in map.ReferenceMaps)
{
if (!CanRead(referenceMap))
continue;
FlagMappedHeaders(referenceMap.Data.Mapping, mapped);
}
}
}
And then in your code, handle the OnUnmappedCsvHeaders callback however you would like, such as by throwing a CsvHelperException or some other custom exception:
using (var readCsv = new ValidatingCsvReader(reader, culture)
{
OnUnmappedCsvHeaders = (context, headers) => throw new CsvHelperException(context, string.Format("Unmapped CSV headers: \"{0}\"", string.Join(",", headers))),
})
Demo fiddles:
This could use additional testing, e.g. for data models with parameterized constructors and additional, mutable properties.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With