I am using the accepted solution here to convert an excel sheet into a datatable. This works fine if I have "perfect" data but if I have a blank cell in the middle of my data it seems to put the wrong data in each column. I think this is because in the below code: <pre class="prettyprint"><code>row.Descendants<Cell>().Count() </code></pre> is number of populated cells (not all columns) AND: <pre class="prettyprint"><code>GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i)); </code></pre> seems to find the next populated cell (not necessarily what is in that index) so if the first column is empty and i call ElementAt(0), it returns the value in the second column. Here is the full parsing code. <pre class="prettyprint"><code>DataRow tempRow = dt.NewRow(); for (int i = 0; i < row.Descendants<Cell>().Count(); i++) { tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i)); if (tempRow[i].ToString().IndexOf("Latency issues in") > -1) { Console.Write(tempRow[i].ToString()); } } </code></pre>

This makes sense since Excel will not store a value for a cell that is null. If you open your file using the Open XML SDK 2.0 Productivity Tool and traverse the XML down to the cell level you will see that only the cells that have data are going to be in that file. Your options are to insert blank data in the range of cells you are going to traverse or programmatically figure out a cell was skipped and adjust your index appropriately. I made an example excel document with a string in cell reference A1 and C1. I then opened up the excel document in the Open XML Productivity Tool and here is the XML that was stored: <pre class="prettyprint"><code><x:row r="1" spans="1:3" xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main"> <x:c r="A1" t="s"> <x:v>0</x:v> </x:c> <x:c r="C1" t="s"> <x:v>1</x:v> </x:c> </x:row> </code></pre> Here you will see that the data corresponds to the first row and that only two cells worth of data are saved for that row. The data saved corresponds to A1 and C1 and that no cells with null values are saved. To get the functionality that you need, you can traverse over the Cells as you are doing above, but you will need to check what the value the Cell is referencing and determine if any Cells have been skipped. to do that you will need two utility functions to get the Column Name from the cell reference and to then translate that column name into a zero based index: <pre class="prettyprint"><code> private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' }; /// <summary> /// Given a cell name, parses the specified cell to get the column name. /// </summary> /// <param name="cellReference">Address of the cell (ie. B2)</param> /// <returns>Column Name (ie. B)</returns> public static string GetColumnName(string cellReference) { // Create a regular expression to match the column name portion of the cell name. Regex regex = new Regex("[A-Za-z]+"); Match match = regex.Match(cellReference); return match.Value; } /// <summary> /// Given just the column name (no row index), it will return the zero based column index. /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ). /// A length of three can be implemented when needed. /// </summary> /// <param name="columnName">Column Name (ie. A or AB)</param> /// <returns>Zero based index if the conversion was successful; otherwise null</returns> public static int? GetColumnIndexFromName(string columnName) { int? columnIndex = null; string[] colLetters = Regex.Split(columnName, "([A-Z]+)"); colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray(); if (colLetters.Count() <= 2) { int index = 0; foreach (string col in colLetters) { List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList(); int? indexValue = Letters.IndexOf(col1.ElementAt(index)); if (indexValue != -1) { // The first letter of a two digit column needs some extra calculations if (index == 0 && colLetters.Count() == 2) { columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26); } else { columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue; } } index++; } } return columnIndex; } </code></pre> Then you can iterate over the Cells and check to see what the cell reference is compared to the columnIndex. If it is less than then you add blank data to your tempRow, otherwise just read in the value contained in the cell. (Note: I did not test the code below, but the general idea should help): <pre class="prettyprint"><code>DataRow tempRow = dt.NewRow(); int columnIndex = 0; foreach (Cell cell in row.Descendants<Cell>()) { // Gets the column index of the cell with data int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference)); if (columnIndex < cellColumnIndex) { do { tempRow[columnIndex] = //Insert blank data here; columnIndex++; } while(columnIndex < cellColumnIndex); } tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell); if (tempRow[i].ToString().IndexOf("Latency issues in") > -1) { Console.Write(tempRow[i].ToString()); } columnIndex++; } </code></pre>

reading Excel Open XML is ignoring blank cells

Tags:

excel

datatable

openxml

openxml-sdk

I am using the accepted solution here to convert an excel sheet into a datatable. This works fine if I have "perfect" data but if I have a blank cell in the middle of my data it seems to put the wrong data in each column.

I think this is because in the below code:

row.Descendants<Cell>().Count()

is number of populated cells (not all columns) AND:

GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));

seems to find the next populated cell (not necessarily what is in that index) so if the first column is empty and i call ElementAt(0), it returns the value in the second column.

Here is the full parsing code.

DataRow tempRow = dt.NewRow();  for (int i = 0; i < row.Descendants<Cell>().Count(); i++) {     tempRow[i] = GetCellValue(spreadSheetDocument, row.Descendants<Cell>().ElementAt(i));     if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)     {         Console.Write(tempRow[i].ToString());     } }

740

asked Oct 01 '10 09:10

leora

1 Answers

This makes sense since Excel will not store a value for a cell that is null. If you open your file using the Open XML SDK 2.0 Productivity Tool and traverse the XML down to the cell level you will see that only the cells that have data are going to be in that file.

Your options are to insert blank data in the range of cells you are going to traverse or programmatically figure out a cell was skipped and adjust your index appropriately.

I made an example excel document with a string in cell reference A1 and C1. I then opened up the excel document in the Open XML Productivity Tool and here is the XML that was stored:

<x:row r="1" spans="1:3"     xmlns:x="http://schemas.openxmlformats.org/spreadsheetml/2006/main">   <x:c r="A1" t="s">     <x:v>0</x:v>   </x:c>   <x:c r="C1" t="s">     <x:v>1</x:v>   </x:c> </x:row>

Here you will see that the data corresponds to the first row and that only two cells worth of data are saved for that row. The data saved corresponds to A1 and C1 and that no cells with null values are saved.

To get the functionality that you need, you can traverse over the Cells as you are doing above, but you will need to check what the value the Cell is referencing and determine if any Cells have been skipped. to do that you will need two utility functions to get the Column Name from the cell reference and to then translate that column name into a zero based index:

    private static List<char> Letters = new List<char>() { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', ' ' };      /// <summary>     /// Given a cell name, parses the specified cell to get the column name.     /// </summary>     /// <param name="cellReference">Address of the cell (ie. B2)</param>     /// <returns>Column Name (ie. B)</returns>     public static string GetColumnName(string cellReference)     {         // Create a regular expression to match the column name portion of the cell name.         Regex regex = new Regex("[A-Za-z]+");         Match match = regex.Match(cellReference);          return match.Value;     }      /// <summary>     /// Given just the column name (no row index), it will return the zero based column index.     /// Note: This method will only handle columns with a length of up to two (ie. A to Z and AA to ZZ).      /// A length of three can be implemented when needed.     /// </summary>     /// <param name="columnName">Column Name (ie. A or AB)</param>     /// <returns>Zero based index if the conversion was successful; otherwise null</returns>     public static int? GetColumnIndexFromName(string columnName)     {         int? columnIndex = null;          string[] colLetters = Regex.Split(columnName, "([A-Z]+)");         colLetters = colLetters.Where(s => !string.IsNullOrEmpty(s)).ToArray();          if (colLetters.Count() <= 2)         {             int index = 0;             foreach (string col in colLetters)             {                 List<char> col1 = colLetters.ElementAt(index).ToCharArray().ToList();                 int? indexValue = Letters.IndexOf(col1.ElementAt(index));                  if (indexValue != -1)                 {                     // The first letter of a two digit column needs some extra calculations                     if (index == 0 && colLetters.Count() == 2)                     {                         columnIndex = columnIndex == null ? (indexValue + 1) * 26 : columnIndex + ((indexValue + 1) * 26);                     }                     else                     {                         columnIndex = columnIndex == null ? indexValue : columnIndex + indexValue;                     }                 }                  index++;             }         }          return columnIndex;     }

Then you can iterate over the Cells and check to see what the cell reference is compared to the columnIndex. If it is less than then you add blank data to your tempRow, otherwise just read in the value contained in the cell. (Note: I did not test the code below, but the general idea should help):

DataRow tempRow = dt.NewRow();  int columnIndex = 0; foreach (Cell cell in row.Descendants<Cell>()) {    // Gets the column index of the cell with data    int cellColumnIndex = (int)GetColumnIndexFromName(GetColumnName(cell.CellReference));     if (columnIndex < cellColumnIndex)    {       do       {          tempRow[columnIndex] = //Insert blank data here;          columnIndex++;       }       while(columnIndex < cellColumnIndex);     }     tempRow[columnIndex] = GetCellValue(spreadSheetDocument, cell);      if (tempRow[i].ToString().IndexOf("Latency issues in") > -1)     {        Console.Write(tempRow[i].ToString());     }     columnIndex++; }

answered Sep 24 '22 21:09

amurra

Related questions
                            
                                Excel how to find values in 1 column exist in the range of values in another
                            
                                How to disable Excel's automatic cell reference change after copy/paste?
                            
                                Difference between 'on error goto 0' and 'on error goto -1' -- VBA
                            
                                How to convert a date in Excel to ISO 8601 format
                            
                                Why MS Excel crashes and closes during Worksheet_Change Sub procedure?
                            
                                Excel Formula to SUMIF date falls in particular month
                            
                                Get User Selected Range
                            
                                Format numbers in thousands (K) in Excel
                            
                                If WorkSheet("wsName") Exists [duplicate]
                            
                                Reordering Chart Data Series [closed]
                            
                                Border around each cell in a range
                            
                                Substitute a comma with a line break in a cell
                            
                                Excel 2013 VBA Clear All Filters macro
                            
                                How can I create a calendar input in VBA Excel?
                            
                                Put Excel-VBA code in module or sheet? [closed]
                            
                                Pandas: Reading Excel with merged cells
                            
                                Driving Excel from Python in Windows
                            
                                get number of columns of a particular row in given excel using Java
                            
                                Open XML SDK 2.0 - how to update a cell in a spreadsheet?
                            
                                How can I programmatically freeze the top row of an Excel worksheet in Excel 2007 VBA?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With