Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

OpenXML - Cell.DateType is null

Tags:

c#

excel

openxml

I can't determine when a Cell is a date.

Date DataType Cell

I noticed the DataType is null so I can't distinguish if it's a Number of a Date.

I am using the next code to extract the cells:

WorksheetPart worksheetPart = (WorksheetPart)workbookPart.GetPartById(worksheetId);
SheetData sheetData = worksheetPart.Worksheet.GetFirstChild<SheetData>();
Row[] rows = worksheetPart.Worksheet.Descendants<Row>().ToArray();
for (int i = 0; i < rows.Length; i++)
{
    List<Cell> cells = rows[i].Elements<Cell>().ToList();
    foreach (var cell in cells) 
    {
        if (cell.DataType != null && cell.DataType.Value == CellValues.Date)
            //this line is not hit for some reason
         }
    }
}

Am I missing something?

like image 956
Fernando Gutierrez Avatar asked Apr 16 '16 23:04

Fernando Gutierrez


2 Answers

In short it is null because it is supposed to be for numeric and date types.

OpenXML documentation on msdn

The value of the DataType property is null for numeric and date types. It contains the value CellValues.SharedString for strings, and CellValues.Boolean for Boolean values.

There is a way though to distinguish between date and number cell formats using the NumberFormatId on the CellFormat. The trick is finding what id maps to what format. You can find out what format to use by creating new excel file and set a cell to the format in question (i.e. date):

enter image description here

Then extract excel file using 7zip and look inside xl/styles.xml file:

enter image description here

In the image above you can see this formatId 14 translates to short date. For a complete list of formats please refer to ECMA-376 documentation for Office Open XML formats (number format table should be buried somewhere inside part 4. They moved it to part 1 in section 18.8.30).

I created an enumeration for the most common formatIds:

private enum Formats
    {
        General = 0,
        Number = 1,
        Decimal = 2,
        Currency = 164,
        Accounting = 44,
        DateShort = 14,
        DateLong = 165,
        Time = 166,
        Percentage = 10,
        Fraction = 12,
        Scientific = 11,
        Text = 49
    }

You could then create a helper function that will get you the formatted value the way you would like:

private static string GetFormattedCellValue(WorkbookPart workbookPart, Cell cell)
    {
        if (cell == null)
        {
            return null;
        }

        string value = "";
        if (cell.DataType == null) // number & dates
        {
            int styleIndex = (int)cell.StyleIndex.Value;
            CellFormat cellFormat = (CellFormat)workbookPart.WorkbookStylesPart.Stylesheet.CellFormats.ElementAt(styleIndex);
            uint formatId = cellFormat.NumberFormatId.Value;
            
            if (formatId == (uint)Formats.DateShort || formatId == (uint)Formats.DateLong)
            {
                double oaDate;
                if (double.TryParse(cell.InnerText, out oaDate))
                {
                    value = DateTime.FromOADate(oaDate).ToShortDateString();
                }
            }
            else
            {
                value = cell.InnerText;
            }
        }
        else // Shared string or boolean
        {
            switch (cell.DataType.Value)
            {
                case CellValues.SharedString:
                    SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(cell.CellValue.InnerText));
                    value = ssi.Text.Text;
                    break;
                case CellValues.Boolean:
                    value = cell.CellValue.InnerText == "0" ? "false" : "true";
                    break;
                default:
                    value = cell.CellValue.InnerText;
                    break;
            }
        }

        return value;
    }
like image 178
dwp4ge Avatar answered Nov 14 '22 15:11

dwp4ge


The previous answer states that the formats are documented in Part 4 but apparently they have been moved to Part 1. I found them by searching for #,##0.00. They are currently in 18.8.30 numFmt (Number Format) for the number formats whose formatCode value is implied rather than explicitly saved in the file. See 18.8.31 numFmts (Number Formats) for number formats in this workbook, consisting of a sequence of numFmt records, where each numFmt record defines a particular number format, indicating how to format and render the numeric value of a cell.

The following are the implied formatCode attribute for all languages.

ID formatCode
0 General
1 0
2 0.00
3 #,##0
4 #,##0.00
9 0%
10 0.00%
11 0.00E+00
12 # ?/?
13 # ??/??
14 mm-dd-yy
15 d-mmm-yy
16 d-mmm
17 mmm-yy
18 h:mm AM/PM
19 h:mm:ss AM/PM
20 h:mm
21 h:mm:ss
22 m/d/yy h:mm
37 #,##0 ;(#,##0)
38 #,##0 ;[Red](#,##0)
39 #,##0.00;(#,##0.00)
40 #,##0.00;[Red](#,##0.00)
45 mm:ss
46 [h]:mm:ss
47 mmss.0
48 ##0.0E+0
49 @

There are many other implied format codes that are language-dependent, such as:

28 [$-404]e"年"m"月"d"日" m"月"d"日"
like image 23
user34660 Avatar answered Nov 14 '22 16:11

user34660