I've read a bunch of stuff on the web about how to get at cell data using the OpenXML API. But there's really not much out there that's particularly straightforward. Most seems to be about writing to SpreadsheetML, not reading... but even that doesn't help much. I've got a spreadsheet that has a table in it. I know what the table name is, and I can find out what sheet it's on, and what columns are in the table. But I can't figure out how to get a collection of rows back that contain the data in the table.
I've got this to load the document and get a handle to the workbook:
SpreadsheetDocument document = SpreadsheetDocument.Open("file.xlsx", false);
WorkbookPart workbook = document.WorkbookPart;
I've got this to find the table/sheet:
Table table = null;
foreach (Sheet sheet in workbook.Workbook.GetFirstChild<Sheets>())
{
WorksheetPart worksheetPart = (WorksheetPart)document.WorkbookPart.GetPartById(sheet.Id);
foreach (TableDefinitionPart tableDefinitionPart in worksheetPart.TableDefinitionParts)
{
if (tableDefinitionPart.Table.DisplayName == this._tableName)
{
table = tableDefinitionPart.Table;
break;
}
}
}
And I can iterate over the columns in the table by foreaching over table.TableColumns.
Open format? Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376.
xls files. Excel for . NET can load and save data and formatting information in OpenXml files; however, formulas are not loaded or saved. They are copied in BIFF format as opaque.
To read an Excel 2007/2010 spreadsheet with OpenXML API is really easy. Somehow even simpler than using OleDB as we always did as quick & dirty solution. Moreover it's not just simple but verbose, I think to put all the code here isn't useful if it has to be commented and explained too so I'll write just a summary and I'll link a good article. Read this article on MSDN, it explain how to read XLSX documents in a very easy way.
Just to summarize you'll do this:
SpreadsheetDocument
with SpreadsheetDocument.Open
.Sheet
you need with a LINQ query from the WorkbookPart
of the document.WorksheetPart
(the object you need) using the Id of the Sheet
.In code, stripping comments and error handling:
using (SpreadsheetDocument document = SpreadsheetDocument.Open(fileName, false))
{
Sheet sheet = document.WorkbookPart.Workbook
.Descendants<Sheet>()
.Where(s => s.Name == sheetName)
.FirstOrDefault();
WorksheetPart sheetPart =
(WorksheetPart)(document.WorkbookPart.GetPartById(theSheet.Id));
}
Now (but inside the using!) what you have to do is just to read a cell value:
Cell cell = sheetPart.Worksheet.Descendants<Cell>().
Where(c => c.CellReference == addressName).FirstOrDefault();
If you have to enumerate the rows (and they are a lot) you have first to obtain a reference to the SheetData
object:
SheetData sheetData = sheetPart.Worksheet.Elements<SheetData>().First();
Now you can ask for all the rows and cells:
foreach (Row row in sheetData.Elements<Row>())
{
foreach (Cell cell in row.Elements<Cell>())
{
string text = cell.CellValue.Text;
// Do something with the cell value
}
}
To simply enumerate a normal spreadsheet you can use Descendants<Row>()
of the WorksheetPart
object.
If you need more resources about OpenXML take a look at OpenXML Developer, it contains a lot of good tutorials.
There are probably many better ways to code this up, but I slapped this together because I needed it, so hopefully it will help some others.
using DocumentFormat.OpenXml.Spreadsheet;
using DocumentFormat.OpenXml.Packaging;
private static DataTable genericExcelTable(FileInfo fileName)
{
DataTable dataTable = new DataTable();
try
{
using (SpreadsheetDocument doc = SpreadsheetDocument.Open(fileName.FullName, false))
{
Workbook wkb = doc.WorkbookPart.Workbook;
Sheet wks = wkb.Descendants<Sheet>().FirstOrDefault();
SharedStringTable sst = wkb.WorkbookPart.SharedStringTablePart.SharedStringTable;
List<SharedStringItem> allSSI = sst.Descendants<SharedStringItem>().ToList<SharedStringItem>();
WorksheetPart wksp = (WorksheetPart)doc.WorkbookPart.GetPartById(wks.Id);
foreach (TableDefinitionPart tdp in wksp.TableDefinitionParts)
{
QueryTablePart qtp = tdp.QueryTableParts.FirstOrDefault<QueryTablePart>();
Table excelTable = tdp.Table;
int colcounter = 0;
foreach (TableColumn col in excelTable.TableColumns)
{
DataColumn dcol = dataTable.Columns.Add(col.Name);
dcol.SetOrdinal(colcounter);
colcounter++;
}
SheetData data = wksp.Worksheet.Elements<SheetData>().First();
foreach (DocumentFormat.OpenXml.Spreadsheet.Row row in data)
{
if (isInTable(row.Descendants<Cell>().FirstOrDefault(), excelTable.Reference, true))
{
int cellcount = 0;
DataRow dataRow = dataTable.NewRow();
foreach (Cell cell in row.Elements<Cell>())
{
if (cell.DataType != null && cell.DataType.InnerText == "s")
{
dataRow[cellcount] = allSSI[int.Parse(cell.CellValue.InnerText)].InnerText;
}
else
{
dataRow[cellcount] = cell.CellValue.Text;
}
cellcount++;
}
dataTable.Rows.Add(dataRow);
}
}
}
}
//do whatever you want with the DataTable
return dataTable;
}
catch (Exception ex)
{
//handle an error
return dataTable;
}
}
private static Tuple<int, int> returnCellReference(string cellRef)
{
int startIndex = cellRef.IndexOfAny("0123456789".ToCharArray());
string column = cellRef.Substring(0, startIndex);
int row = Int32.Parse(cellRef.Substring(startIndex));
return new Tuple<int,int>(TextToNumber(column), row);
}
private static int TextToNumber(string text)
{
return text
.Select(c => c - 'A' + 1)
.Aggregate((sum, next) => sum * 26 + next);
}
private static bool isInTable(Cell testCell, string tableRef, bool headerRow){
Tuple<int, int> cellRef = returnCellReference(testCell.CellReference.ToString());
if (tableRef.Contains(":"))
{
int header = 0;
if (headerRow)
{
header = 1;
}
string[] tableExtremes = tableRef.Split(':');
Tuple<int, int> startCell = returnCellReference(tableExtremes[0]);
Tuple<int, int> endCell = returnCellReference(tableExtremes[1]);
if (cellRef.Item1 >= startCell.Item1
&& cellRef.Item1 <= endCell.Item1
&& cellRef.Item2 >= startCell.Item2 + header
&& cellRef.Item2 <= endCell.Item2) { return true; }
else { return false; }
}
else if (cellRef.Equals(returnCellReference(tableRef)))
{
return true;
}
else
{
return false;
}
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With