Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

.NET OpenXML performance issues

I am attempting to write out an Excel file from an ASP.NET web server using OpenXML. I have about 2100 records and its taking around 20-30 seconds to do this. Any way I can make it faster? Retrieving the 2100 rows from the db takes a fraction of a second. Not sure why manipulating them in memory would take any longer.

Note: ExcelWriter is our custom class, but all its methods are directly from code in this link, http://msdn.microsoft.com/en-us/library/cc861607.aspx

   public static MemoryStream CreateThingReport(List<Thing> things, MemoryStream template)
    {
        SpreadsheetDocument spreadsheet = SpreadsheetDocument.Open(template, true);
        WorksheetPart workSheetPart = spreadsheet.WorkbookPart.WorksheetParts.First();

        SharedStringTablePart sharedStringPart = spreadsheet.WorkbookPart.GetPartsOfType<SharedStringTablePart>().First();

        Cell cell = null;
        int index = 0;

        //create cell formatting for header text
        Alignment wrappedAlignment = new Alignment { WrapText = true };
               uint rowOffset = 2;

  foreach (Thing t in things)
        {
            //Received Date
            cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
            index = ExcelWriter.InsertSharedStringItem(t.CreateDate.ToShortDateString(), sharedStringPart);
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);

            //Car Part Name
            cell = ExcelWriter.InsertCellIntoWorksheet("B", rowOffset, workSheetPart);
            index = ExcelWriter.InsertSharedStringItem(t.CarPart.Name, sharedStringPart);
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);

  rowOffset++; 
   }

 workSheetPart.Worksheet.Save();

        spreadsheet.WorkbookPart.Workbook.Save();
        spreadsheet.Close();

        return template;
like image 430
The Internet Avatar asked May 10 '12 21:05

The Internet


3 Answers

The big improment is more Save() function out of loop

 //Save data
        shareStringPart.SharedStringTable.Save();
        worksheetPart.Worksheet.Save();

For 500 records, for me it change from 10 mins to 1 min.

like image 85
zquanghoangz Avatar answered Sep 20 '22 03:09

zquanghoangz


So it looks like someone in the MSDN community docs ran into similar performance implications. The code below is very inefficient. Someone recommended using a hash table.

For our solution we just removed the insertion of shared strings altogether and went from 1:03 seconds to 0:03 seconds in download time.

//Old: (1:03)
            cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
            index = ExcelWriter.InsertSharedStringItem(thing.CreateDate.ToShortDateString(), sharedStringPart);
            cell.CellValue = new CellValue(index.ToString());
            cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.SharedString);

 //New: (0:03)
             cell = ExcelWriter.InsertCellIntoWorksheet("A", rowOffset, workSheetPart);
             cell.CellValue = new CellValue(thing.CreateDate.ToShortDateString());
              cell.DataType = new DocumentFormat.OpenXml.EnumValue<CellValues>(CellValues.String);

MSDN Docs (slow solution, they should use a Hash Table instead)

      private static int InsertSharedStringItem(string text, SharedStringTablePart         shareStringPart)
  {
// If the part does not contain a SharedStringTable, create one.
if (shareStringPart.SharedStringTable == null)
{
    shareStringPart.SharedStringTable = new SharedStringTable();
}

int i = 0;

// Iterate through all the items in the SharedStringTable. If the text already exists, return its index.
foreach (SharedStringItem item in shareStringPart.SharedStringTable.Elements<SharedStringItem>())
{
    if (item.InnerText == text)
    {
        return i;
    }

    i++;
}

// The text does not exist in the part. Create the SharedStringItem and return its index.
shareStringPart.SharedStringTable.AppendChild(new SharedStringItem(new DocumentFormat.OpenXml.Spreadsheet.Text(text)));
shareStringPart.SharedStringTable.Save();

return i;
 }  
like image 28
The Internet Avatar answered Sep 21 '22 03:09

The Internet


@The Internet

Note that String data type is actually for formulas, for text should use InlineString. See 17.18.11 ST_CellType (Cell Type):

  • inlineStr (Inline String) - Cell containing an (inline) rich string, i.e., one not in the shared string table. If this cell type is used, then the cell value is in the is element rather than the v element in the cell (c element).
  • str (String) - Cell containing a formula string.
like image 40
user3285954 Avatar answered Sep 22 '22 03:09

user3285954