Do anyone come across a performance issue when deleting a first row in a 20,000+ rows Excel file using OpenXML SDK v2.0?
I am using the delete row coding suggested in the Open XML SDK document. It takes me several minutes just to delete the first row using Open XML SDK, but it only takes just a second in Excel applicaton.
I eventually found out that the bottle-neck is actually on the bubble-up approach in dealing with row deletion. There are many rows updating after the deleted row. So in my case, there are around 20,000 rows to be updated, shifting up the data row by row.
I wonder if there is any faster way to do the row deletion.
Do anybody have an idea?
Open format? Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376.
OpenXML is also known as OOXML and it fully XML-based format for office documents, including word processing documents, spreadsheets, presentations, as well as charts, diagrams, shapes, and other graphical material.
Well, the bad news here is: yep, that's the way it is.
You may get slightly better performance moving outside of the SDK itself into System.IO.Packaging
and just creating an IEnumerable
/List
in like Linq-to-XML of all the rows, copy that to a new IEnumerable
/List
without the first row, rewrite the r
attribute of <row r="?"/>
to be it's place in the index, and the write that back inside <sheetData/>
over existing children.
You'd need to kind of do the same for any strings in the sharedStrings.xml file - i.e. removing the <ssi>.<si>
elements that were in the row that was deleted, but in this case they are now implicitly indexed, so you'd be able to get away with just outright removing them.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With