Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to parse excel rows back to types using EPPlus

Tags:

c#

epplus

EPPlus has a convenient LoadFromCollection<T> method to get data of my own type into a worksheet.

For example if I have a class:

public class Customer {     public int Id { get; set; }     public string Firstname { get; set; }     public string Surname { get; set; }     public DateTime Birthdate { get; set; } } 

Then the following code:

var package = new ExcelPackage(); var sheet = package.Workbook.Worksheets.Add("Customers"); var customers = new List<Customer>{     new Customer{         Id = 1,         Firstname = "John",         Surname = "Doe",         Birthdate = new DateTime(2000, 1, 1)     },     new Customer{         Id = 2,         Firstname = "Mary",         Surname = "Moe",         Birthdate = new DateTime(2001, 2, 2)     } }; sheet.Cells[1, 1].LoadFromCollection(customers); package.Save(); 

...will add 2 rows to a worksheet called "Customers".

My question is if there is a convenient counterpart to extract the rows from excel (for example after some modifications have been made) back into my types.

Something like:

var package = new ExcelPackage(inputStream); var customers = sheet.Dimension.SaveToCollection<Customer>() ?? 

I have

  • been looking through the EPPlus codebase
  • searched for any saving questions
  • searched for any parsing questions
  • seen this question on reading single cells

... but found nothing on how to simply parse the rows to my type.

like image 884
Philip Bijker Avatar asked Oct 30 '15 12:10

Philip Bijker


People also ask

What is the use of EPPlus?

EPPlus is a very helpful open-source 3rd party DLL for writing data to excel. EPPlus supports multiple properties of spreadsheets like cell ranges, cell styling, charts, pictures, shapes, comments, tables, protection, encryption, pivot tables, data validation, conditional formatting, formula calculation, etc.

How do I change the date format in Excel using EPPlus?

Format = "yyyy-mm-dd"; ws. Cells[3, 1]. Value = new DateTime(2014,10,5); ws.

How do I merge cells in EPPlus?

If you want to merge cells dynamically, you can also use: worksheet. Cells[FromRow, FromColumn, ToRow, ToColumn].


2 Answers

Inspired by the above I took it a slightly different route.

  1. I created an attribute and mapped each property to a column.
  2. I use the DTO type to define what I expect each column to be
  3. Allow columns to not be requried
  4. Use EPPlus to convert the types

By doing so it allows me to use traditional model validation, and embrace changes to column headers

-- Usage:

using(FileStream fileStream = new FileStream(_fileName, FileMode.Open)){       ExcelPackage excel = new ExcelPackage(fileStream);       var workSheet = excel.Workbook.Worksheets[RESOURCES_WORKSHEET];        IEnumerable<ExcelResourceDto> newcollection = workSheet.ConvertSheetToObjects<ExcelResourceDto>();       newcollection.ToList().ForEach(x => Console.WriteLine(x.Title));  } 

Dto that maps to excel

public class ExcelResourceDto {     [Column(1)]     [Required]     public string Title { get; set; }      [Column(2)]     [Required]     public string SearchTags { get; set; } } 

This is the attribute definition

[AttributeUsage(AttributeTargets.All)] public class Column : System.Attribute {     public int ColumnIndex { get; set; }       public Column(int column)      {         ColumnIndex = column;     } }  

Extension class to handle mapping rows to DTO

public static class EPPLusExtensions {    public static IEnumerable<T> ConvertSheetToObjects<T>(this ExcelWorksheet worksheet) where T : new()     {          Func<CustomAttributeData, bool> columnOnly = y => y.AttributeType == typeof(Column);          var columns = typeof(T)                 .GetProperties()                 .Where(x => x.CustomAttributes.Any(columnOnly))         .Select(p => new         {             Property = p,             Column = p.GetCustomAttributes<Column>().First().ColumnIndex //safe because if where above         }).ToList();           var rows= worksheet.Cells             .Select(cell => cell.Start.Row)             .Distinct()             .OrderBy(x=>x);           //Create the collection container         var collection = rows.Skip(1)             .Select(row =>             {                 var tnew = new T();                 columns.ForEach(col =>                 {                     //This is the real wrinkle to using reflection - Excel stores all numbers as double including int                     var val = worksheet.Cells[row, col.Column];                     //If it is numeric it is a double since that is how excel stores all numbers                     if (val.Value == null)                     {                         col.Property.SetValue(tnew, null);                         return;                     }                     if (col.Property.PropertyType == typeof(Int32))                     {                         col.Property.SetValue(tnew, val.GetValue<int>());                         return;                     }                     if (col.Property.PropertyType == typeof(double))                     {                         col.Property.SetValue(tnew, val.GetValue<double>());                         return;                     }                     if (col.Property.PropertyType == typeof(DateTime))                     {                         col.Property.SetValue(tnew, val.GetValue<DateTime>());                         return;                     }                     //Its a string                     col.Property.SetValue(tnew, val.GetValue<string>());                 });                  return tnew;             });           //Send it back         return collection;     } } 
like image 102
Nix Avatar answered Oct 04 '22 02:10

Nix


There is no such method native to EPPlus unfortunately. Its a tough nut to crack since you would have to use reflections if you truly want it to be generic. And because of Excel storing all numbers and dates as double you have to deal with alot of unboxing and type checks.

This is something I have been working on. Its an extension method that will do it via Generics. It works but only under limited testing so make sure you check it yourself. I cant guarantee it is the most optimized (yet) but it is pretty decent at his point. You would use it like this:

IEnumerable<TestObject> newcollection = worksheet.ConvertSheetToObjects<TestObject>(); 

The extension:

public static IEnumerable<T> ConvertSheetToObjects<T>(this ExcelWorksheet worksheet) where T:new() {     //DateTime Conversion     var convertDateTime = new Func<double, DateTime>(excelDate =>     {         if (excelDate < 1)             throw new ArgumentException("Excel dates cannot be smaller than 0.");          var dateOfReference = new DateTime(1900, 1, 1);          if (excelDate > 60d)             excelDate = excelDate - 2;         else             excelDate = excelDate - 1;         return dateOfReference.AddDays(excelDate);     });      //Get the properties of T     var tprops = (new T())         .GetType()         .GetProperties()         .ToList();      //Cells only contains references to cells with actual data     var groups = worksheet.Cells         .GroupBy(cell => cell.Start.Row)         .ToList();      //Assume the second row represents column data types (big assumption!)     var types = groups         .Skip(1)         .First()         .Select(rcell => rcell.Value.GetType())         .ToList();      //Assume first row has the column names     var colnames = groups         .First()         .Select((hcell, idx) => new { Name = hcell.Value.ToString(), index = idx })         .Where(o => tprops.Select(p => p.Name).Contains(o.Name))         .ToList();      //Everything after the header is data     var rowvalues = groups         .Skip(1) //Exclude header         .Select(cg => cg.Select(c => c.Value).ToList());       //Create the collection container     var collection = rowvalues         .Select(row =>         {             var tnew = new T();             colnames.ForEach(colname =>             {                 //This is the real wrinkle to using reflection - Excel stores all numbers as double including int                 var val = row[colname.index];                 var type = types[colname.index];                 var prop = tprops.First(p => p.Name == colname.Name);                  //If it is numeric it is a double since that is how excel stores all numbers                 if (type == typeof (double))                 {                     //Unbox it                     var unboxedVal = (double) val;                      //FAR FROM A COMPLETE LIST!!!                     if (prop.PropertyType == typeof (Int32))                         prop.SetValue(tnew, (int) unboxedVal);                     else if (prop.PropertyType == typeof (double))                         prop.SetValue(tnew, unboxedVal);                     else if (prop.PropertyType == typeof (DateTime))                         prop.SetValue(tnew, convertDateTime(unboxedVal));                     else                         throw new NotImplementedException(String.Format("Type '{0}' not implemented yet!", prop.PropertyType.Name));                 }                 else                 {                     //Its a string                     prop.SetValue(tnew, val);                 }             });              return tnew;         });       //Send it back     return collection; } 

A FULL example:

[TestMethod] public void Read_To_Collection_Test() {        //A collection to Test     var objectcollection = new List<TestObject>();      for (var i = 0; i < 10; i++)         objectcollection.Add(new TestObject {Col1 = i, Col2 = i*10, Col3 = Path.GetRandomFileName(), Col4 = DateTime.Now.AddDays(i)});      //Create a test file to convert back     byte[] bytes;     using (var pck = new ExcelPackage())     {         //Load the random data         var workbook = pck.Workbook;         var worksheet = workbook.Worksheets.Add("data");         worksheet.Cells.LoadFromCollection(objectcollection, true);         bytes = pck.GetAsByteArray();     }       //*********************************     //Convert from excel to a collection     using (var pck = new ExcelPackage(new MemoryStream(bytes)))     {         var workbook = pck.Workbook;         var worksheet = workbook.Worksheets["data"];          var newcollection = worksheet.ConvertSheetToObjects<TestObject>();         newcollection.ToList().ForEach(to => Console.WriteLine("{{ Col1:{0}, Col2: {1}, Col3: \"{2}\", Col4: {3} }}", to.Col1, to.Col2, to.Col3, to.Col4.ToShortDateString()));     } }  //test object class public class TestObject {     public int Col1 { get; set; }     public int Col2 { get; set; }     public string Col3 { get; set; }     public DateTime Col4 { get; set; } } 

Console Output:

{ Col1:0, Col2: 0, Col3: "wrulvxbx.wdv", Col4: 10/30/2015 } { Col1:1, Col2: 10, Col3: "wflh34yu.0pu", Col4: 10/31/2015 } { Col1:2, Col2: 20, Col3: "ps0f1jg0.121", Col4: 11/1/2015 } { Col1:3, Col2: 30, Col3: "skoc2gx1.2xs", Col4: 11/2/2015 } { Col1:4, Col2: 40, Col3: "urs3jnbb.ob1", Col4: 11/3/2015 } { Col1:5, Col2: 50, Col3: "m4l2fese.4yz", Col4: 11/4/2015 } { Col1:6, Col2: 60, Col3: "v3dselpn.rqq", Col4: 11/5/2015 } { Col1:7, Col2: 70, Col3: "v2ggbaar.r31", Col4: 11/6/2015 } { Col1:8, Col2: 80, Col3: "da4vd35p.msl", Col4: 11/7/2015 } { Col1:9, Col2: 90, Col3: "v5dtpuad.2ao", Col4: 11/8/2015 } 
like image 20
Ernie S Avatar answered Oct 04 '22 00:10

Ernie S