Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Open XML SDK v2.0 Performance issue when deleting a first row in 20,000+ rows Excel file

Do anyone come across a performance issue when deleting a first row in a 20,000+ rows Excel file using OpenXML SDK v2.0?

I am using the delete row coding suggested in the Open XML SDK document. It takes me several minutes just to delete the first row using Open XML SDK, but it only takes just a second in Excel applicaton.

I eventually found out that the bottle-neck is actually on the bubble-up approach in dealing with row deletion. There are many rows updating after the deleted row. So in my case, there are around 20,000 rows to be updated, shifting up the data row by row.

I wonder if there is any faster way to do the row deletion.

Do anybody have an idea?

like image 335
Perpetual Learner Avatar asked Dec 10 '09 01:12

Perpetual Learner


People also ask

What is OpenXML for Excel?

Open format? Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376.

What is OpenXML C#?

OpenXML is also known as OOXML and it fully XML-based format for office documents, including word processing documents, spreadsheets, presentations, as well as charts, diagrams, shapes, and other graphical material.


1 Answers

Well, the bad news here is: yep, that's the way it is.

You may get slightly better performance moving outside of the SDK itself into System.IO.Packaging and just creating an IEnumerable/List in like Linq-to-XML of all the rows, copy that to a new IEnumerable/List without the first row, rewrite the r attribute of <row r="?"/> to be it's place in the index, and the write that back inside <sheetData/> over existing children.

You'd need to kind of do the same for any strings in the sharedStrings.xml file - i.e. removing the <ssi>.<si> elements that were in the row that was deleted, but in this case they are now implicitly indexed, so you'd be able to get away with just outright removing them.

like image 191
Todd Main Avatar answered Oct 03 '22 03:10

Todd Main