I'm reading an .xlsx file using the Office Open XML SDK and am confused about reading Date/Time values. One of my spreadsheets has this markup (generated by Excel 2010)
<x:row r="2" spans="1:22" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <x:c r="A2" t="s"> <x:v>56</x:v> </x:c> <x:c r="B2" t="s"> <x:v>64</x:v> </x:c> . . . <x:c r="J2" s="9"> <x:v>17145</x:v> </x:c>
Cell J2 has a date serial value in it and a style attribute s="9"
. However, the Office Open XML Specification says that 9 corresponds to a followed hyperlink. This is a screen shot from page 4,999 of ECMA-376, Second Edition, Part 1 - Fundamentals And Markup Language Reference.pdf.
The presetCellStyles.xml file included with the spec also refers to builtinId
9 as a followed hyperlink.
<followedHyperlink builtinId="9">
All of the styles in the spec are simply visual formatting styles, not number styles. Where are the number styles defined and how does one differentiate a style reference s="9"
from indicating a cell formatting (visual) style vs a number style?
Obviously I'm looking in the wrong place to match styles on cells with their number formats. Where's the right place to find this information?
SharedStringTable in SpreadsheetMLAn instance of this part type contains one occurrence of each unique string that occurs on all worksheets in a workbook. A package shall contain exactly one Shared String Table part. The root element for a part of this content type shall be sst.
You can use the GetCellValue method to retrieve the value of a cell in a workbook. The method requires the following three parameters: A string that contains the name of the document to examine.
XLSX was originally developed by Microsoft as an XML-based format to replace the earlier proprietary binary format that uses the . xls file extension [MS-XLS]. Since Excel 2007, XLSX has been the default format for the Save operation.
XLSX file format was introduced in 2007 and uses the Open XML standard adapted by Microsoft back in 2000.
The s attribute references a style xf entry in styles.xml. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask).
The style.xml file has elements like:
<xf numFmtId="14" ... applyNumberFormat="1" /> <xf numFmtId="1" ... applyNumberFormat="1" />
These are the xf entries, which in turn give you a numFmtId that references the number format mask.
You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element
<?xml version="1.0" encoding="UTF-8" standalone="yes" ?> <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <numFmts count="3"> <numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" /> <numFmt numFmtId="165" formatCode="0.000" /> <numFmt numFmtId="166" formatCode="#,##0.000" /> </numFmts>
The number format id may be here, or it may be one of the built-in formats. Number format codes (numFmtId) less than 164 are "built-in".
The list that I have is incomplete:
0 = 'General'; 1 = '0'; 2 = '0.00'; 3 = '#,##0'; 4 = '#,##0.00'; 9 = '0%'; 10 = '0.00%'; 11 = '0.00E+00'; 12 = '# ?/?'; 13 = '# ??/??'; 14 = 'mm-dd-yy'; 15 = 'd-mmm-yy'; 16 = 'd-mmm'; 17 = 'mmm-yy'; 18 = 'h:mm AM/PM'; 19 = 'h:mm:ss AM/PM'; 20 = 'h:mm'; 21 = 'h:mm:ss'; 22 = 'm/d/yy h:mm'; 37 = '#,##0 ;(#,##0)'; 38 = '#,##0 ;[Red](#,##0)'; 39 = '#,##0.00;(#,##0.00)'; 40 = '#,##0.00;[Red](#,##0.00)'; 44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)'; 45 = 'mm:ss'; 46 = '[h]:mm:ss'; 47 = 'mmss.0'; 48 = '##0.0E+0'; 49 = '@'; 27 = '[$-404]e/m/d'; 30 = 'm/d/yy'; 36 = '[$-404]e/m/d'; 50 = '[$-404]e/m/d'; 57 = '[$-404]e/m/d'; 59 = 't0'; 60 = 't0.00'; 61 = 't#,##0'; 62 = 't#,##0.00'; 67 = 't0%'; 68 = 't0.00%'; 69 = 't# ?/?'; 70 = 't# ??/??';
The missing values are mainly related to east asian variant formats.
Thought I'd add my solution that I've put together to determine if the double value FromOADate
is really a date or not. Reason being is I have a zip code in my excel file as well. The numberingFormat
will be null if it's text.
Alternatively you could use the numberingFormatId
and check against a list of Ids
that Excel uses for dates.
In my case I've explicitly determined the formatting of all fields for the client.
/// <summary> /// Creates the datatable and parses the file into a datatable /// </summary> /// <param name="fileName">the file upload's filename</param> private void ReadAsDataTable(string fileName) { try { DataTable dt = new DataTable(); using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(string.Format("{0}/{1}", UploadPath, fileName), false)) { WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart; IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>(); string relationshipId = sheets.First().Id.Value; WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId); Worksheet workSheet = worksheetPart.Worksheet; SheetData sheetData = workSheet.GetFirstChild<SheetData>(); IEnumerable<Row> rows = sheetData.Descendants<Row>(); var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats; var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats; // columns omitted for brevity // skip first row as this row is column header names foreach (Row row in rows.Skip(1)) { DataRow dataRow = dt.NewRow(); for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { bool isDate = false; var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value; var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex); if (cellFormat.NumberFormatId != null) { var numberFormatId = cellFormat.NumberFormatId.Value; var numberingFormat = numberingFormats.Cast<NumberingFormat>() .SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId); // Here's yer string! Example: $#,##0.00_);[Red]($#,##0.00) if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("mm/dd/yy")) { string formatString = numberingFormat.FormatCode.Value; isDate = true; } } // replace '-' with empty string string value = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i), isDate); dataRow[i] = value.Equals("-") ? string.Empty : value; } dt.Rows.Add(dataRow); } } this.InsertMembers(dt); dt.Clear(); } catch (Exception ex) { LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex); } } /// <summary> /// Reads the cell's value /// </summary> /// <param name="document">current document</param> /// <param name="cell">the cell to read</param> /// <returns>cell's value</returns> private string GetCellValue(SpreadsheetDocument document, Cell cell, bool isDate) { string value = string.Empty; try { SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart; value = cell.CellValue.InnerXml; if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString) { return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText; } else { // check if this is a date or zip. // integers will be passed into this else statement as well. if (isDate) { value = DateTime.FromOADate(double.Parse(value)).ToString(); } return value; } } catch (Exception ex) { LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex); } return value; }
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With