Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reading dates from OpenXml Excel files

I'm trying to read data from the .xlsx files using SharpZipLib to unpack it (in memory) and reading the inner xml files. Everything is fine but recognizing the dates - they're stored in julean format and I need to somehow recognize if a number is a date or only a number. In another topic (unfortunately it died and I need quick answer) I got to know some things from Mark Baker, but it's still not enough...

"Excel stores dates as a float value... the integer part being the number of days since 1/1/1900 (or 1/1/1904 depending on which calendar is being used), the fractional part being the proportion of a day (ie the time part)... made slightly more awkward by the fact that 1900 is considered a leap year.

The only thing that differentiates a data from a number is the number format mask. If you can read the format mask, you can use that to identify the value as a date rather than a number... then calculate the date value/formatting from the base date."

"But doesn't the attribute "s" for dates has always the value of "1"? I know it defines style, but maybe? ;)"

The s attribute references a style xf entry in styles.xml, and it won't always be entry 1 for dates... it all depends how many different styles are being used in the workbook. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask)

"One more question - I'm now looking at the style.xml's content and in the section I see elements like: "<xf numFmtId="14" ... applyNumberFormat="1" />", "<xf numFmtId="1" ... applyNumberFormat="1" />", etc. but there is no <numFmts> section... Are there any "standard" formats? Or am I just missing something?"

Can someone please help me out? Thanks in advance.

like image 583
brovar Avatar asked Jan 11 '11 08:01

brovar


People also ask

Does OpenXml support XLS?

doc and . xls files. Excel for . NET can load and save data and formatting information in OpenXml files; however, formulas are not loaded or saved.

What is OpenXml for Excel?

Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376. ISO and IEC standardized later versions as ISO/IEC 29500.

What is DocumentFormat OpenXml?

The Open XML SDK provides tools for working with Office Word, Excel, and PowerPoint documents. It supports scenarios such as: - High-performance generation of word-processing documents, spreadsheets, and presentations. - Populating content in Word files from an XML data source.


1 Answers

Cells may have styles. These are uints that index cellXfs in the styleSheet. Each cellXfs item contains a set of attributes. The most important is NumberFormatID. If its value falls in the range 14-22 it is a "standard" date. If it falls in the range 165 - 180, it is a "formatted" date and will have a corresponding NumberingFormat attribute.

Standard Date

[x:c r="A2" s="2"][x:v]38046[/x:v][/x:c]

[x:xf numFmtId="14" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /] (ordinal position 2)

Formatted Date

[x:c r="A4" s="4"][x:v]38048[/x:v][/x:c]

[x:xf numFmtId="166" fontId="0" fillId="0" borderId="0" xfId="0" applyNumberFormat="1" /](ordinal position 4)

[x:numFmt numFmtId="166" formatCode="m/d;@" /]

This code extracts a list of style IDs that correspond to these date formats.

  private void GetDateStyles()
  {
     //
     // The only way to tell dates from numbers is by looking at the style index. 
     // This indexes cellXfs, which contains NumberFormatIds, which index NumberingFormats.
     // This method creates a list of the style indexes that pertain to dates.
     WorkbookStylesPart workbookStylesPart = (WorkbookStylesPart) UriPartDictionary["/xl/styles.xml"];
     Stylesheet styleSheet = workbookStylesPart.Stylesheet;
     CellFormats  cellFormats = styleSheet.CellFormats;

     int i = 0;
     foreach (CellFormat cellFormat in cellFormats)
     {
        uint numberFormatId = cellFormat.NumberFormatId;
        if ((numberFormatId >= 14 && numberFormatId <= 22) 
        || (numberFormatId >= 165u && numberFormatId <= 180u))
        {
           _DateStyles.Add(i.ToString());
        }
        i++;
     }
like image 191
Pat Dooley Avatar answered Nov 13 '22 01:11

Pat Dooley