I'm reading a CSV file and the records are recorded as a string[]. I want to take each record and convert it into a custom object.
T GetMyObject<T>();
Currently I'm doing this through reflection which is really slow. I'm testing with a 515 Meg file with several million records. It takes under 10 seconds to parse. It takes under 20 seconds to create the custom objects using manual conversions with Convert.ToSomeType
but around 4 minutes to do the conversion to the objects through reflection.
What is a good way to handle this automatically?
It seems a lot of time is spent in the PropertyInfo.SetValue
method. I tried caching the properties MethodInfo
setter and using that instead, but it was actually slower.
I have also tried converting that into a delegate like the great Jon Skeet suggested here: Improving performance reflection , what alternatives should I consider, but the problem is I don't know what the property type is ahead of time. I'm able to get the delegate
var myObject = Activator.CreateInstance<T>();
foreach( var property in typeof( T ).GetProperties() )
{
var d = Delegate.CreateDelegate( typeof( Action<,> )
.MakeGenericType( typeof( T ), property.PropertyType ), property.GetSetMethod() );
}
The problem here is I can't cast the delegate into a concrete type like Action<T, int>
, because the property type of int
isn't known ahead of time.
You can use reflection to dynamically create an instance of a type, bind the type to an existing object, or get the type from an existing object and invoke its methods or access its fields and properties. If you are using attributes in your code, reflection enables you to access them.
Adding setAccessible(true) call makes these reflection calls faster, but even then it takes 5.5 nanoseconds per call. Reflection is 104% slower than direct access (so about twice as slow). It also takes longer to warm up.
Never use reflection in production code!
The first thing I'd say is write some sample code manually that tells you what the absolute best case you can expect is - see if your current code is worth fixing.
If you are using PropertyInfo.SetValue
etc, then absolutely you can make it quicker, even with juts object
- HyperDescriptor might be a good start (it is significantly faster than raw reflection, but without making the code any more complicated).
For optimal performance, dynamic IL methods are the way to go (precompiled once); in 2.0/3.0, maybe DynamicMethod
, but in 3.5 I'd favor Expression
(with Compile()
). Let me know if you want more detail?
Implementation using Expression
and CsvReader
, that uses the column headers to provide the mapping (it invents some data along the same lines); it uses IEnumerable<T>
as the return type to avoid having to buffer the data (since you seem to have quite a lot of it):
using System;
using System.Collections.Generic;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using LumenWorks.Framework.IO.Csv;
class Entity
{
public string Name { get; set; }
public DateTime DateOfBirth { get; set; }
public int Id { get; set; }
}
static class Program {
static void Main()
{
string path = "data.csv";
InventData(path);
int count = 0;
foreach (Entity obj in Read<Entity>(path))
{
count++;
}
Console.WriteLine(count);
}
static IEnumerable<T> Read<T>(string path)
where T : class, new()
{
using (TextReader source = File.OpenText(path))
using (CsvReader reader = new CsvReader(source,true,delimiter)) {
string[] headers = reader.GetFieldHeaders();
Type type = typeof(T);
List<MemberBinding> bindings = new List<MemberBinding>();
ParameterExpression param = Expression.Parameter(typeof(CsvReader), "row");
MethodInfo method = typeof(CsvReader).GetProperty("Item",new [] {typeof(int)}).GetGetMethod();
Expression invariantCulture = Expression.Constant(
CultureInfo.InvariantCulture, typeof(IFormatProvider));
for(int i = 0 ; i < headers.Length ; i++) {
MemberInfo member = type.GetMember(headers[i]).Single();
Type finalType;
switch (member.MemberType)
{
case MemberTypes.Field: finalType = ((FieldInfo)member).FieldType; break;
case MemberTypes.Property: finalType = ((PropertyInfo)member).PropertyType; break;
default: throw new NotSupportedException();
}
Expression val = Expression.Call(
param, method, Expression.Constant(i, typeof(int)));
if (finalType != typeof(string))
{
val = Expression.Call(
finalType, "Parse", null, val, invariantCulture);
}
bindings.Add(Expression.Bind(member, val));
}
Expression body = Expression.MemberInit(
Expression.New(type), bindings);
Func<CsvReader, T> func = Expression.Lambda<Func<CsvReader, T>>(body, param).Compile();
while (reader.ReadNextRecord()) {
yield return func(reader);
}
}
}
const char delimiter = '\t';
static void InventData(string path)
{
Random rand = new Random(123456);
using (TextWriter dest = File.CreateText(path))
{
dest.WriteLine("Id" + delimiter + "DateOfBirth" + delimiter + "Name");
for (int i = 0; i < 10000; i++)
{
dest.Write(rand.Next(5000000));
dest.Write(delimiter);
dest.Write(new DateTime(
rand.Next(1960, 2010),
rand.Next(1, 13),
rand.Next(1, 28)).ToString(CultureInfo.InvariantCulture));
dest.Write(delimiter);
dest.Write("Fred");
dest.WriteLine();
}
dest.Close();
}
}
}
Second version (see comments) that uses TypeConverter
rather than Parse
:
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Globalization;
using System.IO;
using System.Linq;
using System.Linq.Expressions;
using System.Reflection;
using LumenWorks.Framework.IO.Csv;
class Entity
{
public string Name { get; set; }
public DateTime DateOfBirth { get; set; }
public int Id { get; set; }
}
static class Program
{
static void Main()
{
string path = "data.csv";
InventData(path);
int count = 0;
foreach (Entity obj in Read<Entity>(path))
{
count++;
}
Console.WriteLine(count);
}
static IEnumerable<T> Read<T>(string path)
where T : class, new()
{
using (TextReader source = File.OpenText(path))
using (CsvReader reader = new CsvReader(source, true, delimiter))
{
string[] headers = reader.GetFieldHeaders();
Type type = typeof(T);
List<MemberBinding> bindings = new List<MemberBinding>();
ParameterExpression param = Expression.Parameter(typeof(CsvReader), "row");
MethodInfo method = typeof(CsvReader).GetProperty("Item", new[] { typeof(int) }).GetGetMethod();
var converters = new Dictionary<Type, ConstantExpression>();
for (int i = 0; i < headers.Length; i++)
{
MemberInfo member = type.GetMember(headers[i]).Single();
Type finalType;
switch (member.MemberType)
{
case MemberTypes.Field: finalType = ((FieldInfo)member).FieldType; break;
case MemberTypes.Property: finalType = ((PropertyInfo)member).PropertyType; break;
default: throw new NotSupportedException();
}
Expression val = Expression.Call(
param, method, Expression.Constant(i, typeof(int)));
if (finalType != typeof(string))
{
ConstantExpression converter;
if (!converters.TryGetValue(finalType, out converter))
{
converter = Expression.Constant(TypeDescriptor.GetConverter(finalType));
converters.Add(finalType, converter);
}
val = Expression.Convert(Expression.Call(converter, "ConvertFromInvariantString", null, val),
finalType);
}
bindings.Add(Expression.Bind(member, val));
}
Expression body = Expression.MemberInit(
Expression.New(type), bindings);
Func<CsvReader, T> func = Expression.Lambda<Func<CsvReader, T>>(body, param).Compile();
while (reader.ReadNextRecord())
{
yield return func(reader);
}
}
}
const char delimiter = '\t';
static void InventData(string path)
{
Random rand = new Random(123456);
using (TextWriter dest = File.CreateText(path))
{
dest.WriteLine("Id" + delimiter + "DateOfBirth" + delimiter + "Name");
for (int i = 0; i < 10000; i++)
{
dest.Write(rand.Next(5000000));
dest.Write(delimiter);
dest.Write(new DateTime(
rand.Next(1960, 2010),
rand.Next(1, 13),
rand.Next(1, 28)).ToString(CultureInfo.InvariantCulture));
dest.Write(delimiter);
dest.Write("Fred");
dest.WriteLine();
}
dest.Close();
}
}
}
You should make a DynamicMethod
or an expression tree and build statically typed code at runtime.
This will incur a rather large setup cost, but no per-object overhead at all.
However, it's somewhat difficult to do, and will result in complicated code that is difficult to debug.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With