Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using OpenXmlReader

I hate to resort to StackOverflow for something so (seemingly) basic, but I've been fighting with Microsoft for the last few hours and seem to be hitting a dead end. I am trying to read (large) Excel 2007+ spreadsheets, and Google has kindly informed me that using the OpenXml SDK is a pretty popular choice. So I gave the thing a shot, read some tutorials, checked Microsoft's own library pages, and got very little out of them all.

I am using a small test spreadsheet with just one column of numbers and one of strings - large scale testing will come later. I've tried several implementations similar to the one I am about to post, and none of them read data. The code below was mostly taken from another StackOverflow thread, where it seemed to have worked - not so for me. I figured I'll have you guys check/debug/help with this version, because it'll likely be less broken than anything I have written today.

static void ReadExcelFileSAX(string fileName)
    {
        using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, true))
        {
            WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;
            WorksheetPart worksheetPart = workbookPart.WorksheetParts.First();

            OpenXmlPartReader reader = new OpenXmlPartReader(worksheetPart);
            string text;
            string rowNum;
            while (reader.Read())
            {
                if (reader.ElementType == typeof(Row))
                {
                    do
                    {
                        if (reader.HasAttributes)
                        {
                            rowNum = reader.Attributes.First(a => a.LocalName == "r").Value;
                            Console.Write("rowNum: " + rowNum); //we never even get here, I tested it with a breakpoint
                        }

                    } while (reader.ReadNextSibling()); // Skip to the next row
                    Console.ReadKey();
                    break; // We just looped through all the rows so no need to continue reading the worksheet
                }
                if (reader.ElementType == typeof(Cell))
                {

                }

                if (reader.ElementType != typeof(Worksheet)) // Dont' want to skip the contents of the worksheet
                    reader.Skip(); // Skip contents of any node before finding the first row.
            }
            reader.Close();
            Console.WriteLine();
            Console.ReadKey();
        }
    }

And, on a side note, are there any good alternatives to using the OpenXml SDK I have somehow missed?

like image 901
Argent Avatar asked May 11 '12 16:05

Argent


People also ask

What is OpenXml for Excel?

In the Open XML SDK, the SpreadsheetDocument class represents an Excel document package. To open and work with an Excel document, you create an instance of the SpreadsheetDocument class from the document.

What is Open XML in C#?

The Open XML SDK 2.5 simplifies the task of manipulating Open XML packages and the underlying Open XML schema elements within a package. The Open XML SDK 2.5 encapsulates many common tasks that developers perform on Open XML packages, so that you can perform complex operations with just a few lines of code.


1 Answers

I think you took the wrong WorksheetPart for reading the rows.

The line

workbookPart.WorksheetParts.First();

gets the first WorksheetPart of the collection which must not necessarily be the first worksheet as you see it in Microsoft Excel.

So, iterate through all WorksheetParts and you should see some output on your console window.

static void ReadExcelFileSAX(string fileName)
{
  using (SpreadsheetDocument spreadsheetDocument = 
                                   SpreadsheetDocument.Open(fileName, true))
  {
    WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;

    // Iterate through all WorksheetParts
    foreach (WorksheetPart worksheetPart in workbookPart.WorksheetParts)
    {          
      OpenXmlPartReader reader = new OpenXmlPartReader(worksheetPart);
      string text;
      string rowNum;
      while (reader.Read())
      {
        if (reader.ElementType == typeof(Row))
        {
          do
          {
            if (reader.HasAttributes)
            {
              rowNum = reader.Attributes.First(a => a.LocalName == "r").Value;
              Console.Write("rowNum: " + rowNum);
            }

          } while (reader.ReadNextSibling()); // Skip to the next row

          break; // We just looped through all the rows so no 
                 // need to continue reading the worksheet
        }

        if (reader.ElementType != typeof(Worksheet))
          reader.Skip(); 
      }
      reader.Close();      
    }
  }  
}

To read all cell values use the following function (all error handling details omitted) :

static void ReadAllCellValues(string fileName)
{
  using (SpreadsheetDocument spreadsheetDocument = SpreadsheetDocument.Open(fileName, false))
  {
    WorkbookPart workbookPart = spreadsheetDocument.WorkbookPart;

    foreach(WorksheetPart worksheetPart in workbookPart.WorksheetParts)
    {
      OpenXmlReader reader = OpenXmlReader.Create(worksheetPart);

      while (reader.Read())
      {
        if (reader.ElementType == typeof(Row))
        {
          reader.ReadFirstChild();

          do
          {
            if (reader.ElementType == typeof(Cell))
            {
              Cell c = (Cell)reader.LoadCurrentElement();

              string cellValue;

              if (c.DataType != null && c.DataType == CellValues.SharedString)
              {
                SharedStringItem ssi = workbookPart.SharedStringTablePart.SharedStringTable.Elements<SharedStringItem>().ElementAt(int.Parse(c.CellValue.InnerText));

                cellValue = ssi.Text.Text;
              }
              else
              {
                cellValue = c.CellValue.InnerText;
              }

              Console.Out.Write("{0}: {1} ", c.CellReference, cellValue);
            }
          } while (reader.ReadNextSibling());
          Console.Out.WriteLine();
        }            
      }
    }   
  }
}

In the code above you see that cells with data type SharedString must be handled using the SharedStringTablePart.

like image 111
Hans Avatar answered Oct 04 '22 09:10

Hans