Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What indicates an Office Open XML Cell contains a Date/Time value?

I'm reading an .xlsx file using the Office Open XML SDK and am confused about reading Date/Time values. One of my spreadsheets has this markup (generated by Excel 2010)

<x:row r="2" spans="1:22" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">   <x:c r="A2" t="s">     <x:v>56</x:v>   </x:c>   <x:c r="B2" t="s">     <x:v>64</x:v>   </x:c>   .   .   .   <x:c r="J2" s="9">     <x:v>17145</x:v>   </x:c> 

Cell J2 has a date serial value in it and a style attribute s="9". However, the Office Open XML Specification says that 9 corresponds to a followed hyperlink. This is a screen shot from page 4,999 of ECMA-376, Second Edition, Part 1 - Fundamentals And Markup Language Reference.pdf.

alt text

The presetCellStyles.xml file included with the spec also refers to builtinId 9 as a followed hyperlink.

<followedHyperlink builtinId="9"> 

All of the styles in the spec are simply visual formatting styles, not number styles. Where are the number styles defined and how does one differentiate a style reference s="9" from indicating a cell formatting (visual) style vs a number style?

Obviously I'm looking in the wrong place to match styles on cells with their number formats. Where's the right place to find this information?

like image 605
Samuel Neff Avatar asked Jan 18 '11 23:01

Samuel Neff


People also ask

What is SharedStringTable?

SharedStringTable in SpreadsheetMLAn instance of this part type contains one occurrence of each unique string that occurs on all worksheets in a workbook. A package shall contain exactly one Shared String Table part. The root element for a part of this content type shall be sst.

Which method is used to retrive string value of the particular cell?

You can use the GetCellValue method to retrieve the value of a cell in a workbook. The method requires the following three parameters: A string that contains the name of the document to examine.

Is XLSX an XML format?

XLSX was originally developed by Microsoft as an XML-based format to replace the earlier proprietary binary format that uses the . xls file extension [MS-XLS]. Since Excel 2007, XLSX has been the default format for the Save operation.

What date was the classic XLSX file's content initially created?

XLSX file format was introduced in 2007 and uses the Open XML standard adapted by Microsoft back in 2000.


2 Answers

The s attribute references a style xf entry in styles.xml. The style xf in turn references a number format mask. To identify a cell that contains a date, you need to perform the style xf -> numberformat lookup, then identify whether that numberformat mask is a date/time numberformat mask (rather than, for example, a percentage or an accounting numberformat mask).

The style.xml file has elements like:

<xf numFmtId="14" ... applyNumberFormat="1" /> <xf numFmtId="1" ... applyNumberFormat="1" /> 

These are the xf entries, which in turn give you a numFmtId that references the number format mask.

You should find the numFmts section somewhere near the top of style.xml, as part of the styleSheet element

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>      <styleSheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main">         <numFmts count="3">             <numFmt numFmtId="164" formatCode="[$-414]mmmm\ yyyy;@" />              <numFmt numFmtId="165" formatCode="0.000" />              <numFmt numFmtId="166" formatCode="#,##0.000" />          </numFmts> 

The number format id may be here, or it may be one of the built-in formats. Number format codes (numFmtId) less than 164 are "built-in".

The list that I have is incomplete:

0 = 'General'; 1 = '0'; 2 = '0.00'; 3 = '#,##0'; 4 = '#,##0.00';  9 = '0%'; 10 = '0.00%'; 11 = '0.00E+00'; 12 = '# ?/?'; 13 = '# ??/??'; 14 = 'mm-dd-yy'; 15 = 'd-mmm-yy'; 16 = 'd-mmm'; 17 = 'mmm-yy'; 18 = 'h:mm AM/PM'; 19 = 'h:mm:ss AM/PM'; 20 = 'h:mm'; 21 = 'h:mm:ss'; 22 = 'm/d/yy h:mm';  37 = '#,##0 ;(#,##0)'; 38 = '#,##0 ;[Red](#,##0)'; 39 = '#,##0.00;(#,##0.00)'; 40 = '#,##0.00;[Red](#,##0.00)';  44 = '_("$"* #,##0.00_);_("$"* \(#,##0.00\);_("$"* "-"??_);_(@_)'; 45 = 'mm:ss'; 46 = '[h]:mm:ss'; 47 = 'mmss.0'; 48 = '##0.0E+0'; 49 = '@';  27 = '[$-404]e/m/d'; 30 = 'm/d/yy'; 36 = '[$-404]e/m/d'; 50 = '[$-404]e/m/d'; 57 = '[$-404]e/m/d';  59 = 't0'; 60 = 't0.00'; 61 = 't#,##0'; 62 = 't#,##0.00'; 67 = 't0%'; 68 = 't0.00%'; 69 = 't# ?/?'; 70 = 't# ??/??'; 

The missing values are mainly related to east asian variant formats.

like image 196
Mark Baker Avatar answered Sep 18 '22 20:09

Mark Baker


Thought I'd add my solution that I've put together to determine if the double value FromOADate is really a date or not. Reason being is I have a zip code in my excel file as well. The numberingFormat will be null if it's text.

Alternatively you could use the numberingFormatId and check against a list of Ids that Excel uses for dates.

In my case I've explicitly determined the formatting of all fields for the client.

    /// <summary>     /// Creates the datatable and parses the file into a datatable     /// </summary>     /// <param name="fileName">the file upload's filename</param>     private void ReadAsDataTable(string fileName)     {         try         {             DataTable dt = new DataTable();             using (SpreadsheetDocument spreadSheetDocument = SpreadsheetDocument.Open(string.Format("{0}/{1}", UploadPath, fileName), false))             {                 WorkbookPart workbookPart = spreadSheetDocument.WorkbookPart;                 IEnumerable<Sheet> sheets = spreadSheetDocument.WorkbookPart.Workbook.GetFirstChild<Sheets>().Elements<Sheet>();                 string relationshipId = sheets.First().Id.Value;                 WorksheetPart worksheetPart = (WorksheetPart)spreadSheetDocument.WorkbookPart.GetPartById(relationshipId);                 Worksheet workSheet = worksheetPart.Worksheet;                 SheetData sheetData = workSheet.GetFirstChild<SheetData>();                 IEnumerable<Row> rows = sheetData.Descendants<Row>();                  var cellFormats = workbookPart.WorkbookStylesPart.Stylesheet.CellFormats;                 var numberingFormats = workbookPart.WorkbookStylesPart.Stylesheet.NumberingFormats;                  // columns omitted for brevity                  // skip first row as this row is column header names                 foreach (Row row in rows.Skip(1))                 {                     DataRow dataRow = dt.NewRow();                      for (int i = 0; i < row.Descendants<Cell>().Count(); i++)                     {                         bool isDate = false;                         var styleIndex = (int)row.Descendants<Cell>().ElementAt(i).StyleIndex.Value;                         var cellFormat = (CellFormat)cellFormats.ElementAt(styleIndex);                          if (cellFormat.NumberFormatId != null)                         {                             var numberFormatId = cellFormat.NumberFormatId.Value;                             var numberingFormat = numberingFormats.Cast<NumberingFormat>()                                 .SingleOrDefault(f => f.NumberFormatId.Value == numberFormatId);                              // Here's yer string! Example: $#,##0.00_);[Red]($#,##0.00)                             if (numberingFormat != null && numberingFormat.FormatCode.Value.Contains("mm/dd/yy"))                             {                                 string formatString = numberingFormat.FormatCode.Value;                                 isDate = true;                             }                         }                          // replace '-' with empty string                         string value = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i), isDate);                         dataRow[i] = value.Equals("-") ? string.Empty : value;                     }                      dt.Rows.Add(dataRow);                 }             }              this.InsertMembers(dt);             dt.Clear();         }         catch (Exception ex)         {             LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);         }     }      /// <summary>     /// Reads the cell's value     /// </summary>     /// <param name="document">current document</param>     /// <param name="cell">the cell to read</param>     /// <returns>cell's value</returns>     private string GetCellValue(SpreadsheetDocument document, Cell cell, bool isDate)     {         string value = string.Empty;          try         {             SharedStringTablePart stringTablePart = document.WorkbookPart.SharedStringTablePart;             value = cell.CellValue.InnerXml;              if (cell.DataType != null && cell.DataType.Value == CellValues.SharedString)             {                 return stringTablePart.SharedStringTable.ChildElements[Int32.Parse(value)].InnerText;             }             else             {                 // check if this is a date or zip.                 // integers will be passed into this else statement as well.                  if (isDate)                 {                     value = DateTime.FromOADate(double.Parse(value)).ToString();                 }                  return value;             }         }         catch (Exception ex)         {             LogHelper.Error(typeof(MemberUploadApiController), ex.Message, ex);         }          return value;     } 
like image 44
Rob Scott Avatar answered Sep 17 '22 20:09

Rob Scott