Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Convert Word docx to Excel using OpenXML

Is there any way to convert Word document where I have some tables into Excel file? It would be very helpful to convert tables.

Something like that:

  • Open Word document using OpenXML
  • Find all tables xml-tags
  • Copy xml-tags
  • Create Excel file
  • Insert xml-tags with table from Word to new Excel file

I mean

void OpenWordDoc(string filePath)
{
_documentWord = SpreadsheetDocument.Open(filePath, true);
}

List<string> GetAllTablesXMLTags()
{
//find and copy
}

List<string> CreateExcelFile(string filePath)
{
TemplateExcelDocument excelDocument = new TemplateExcelDocument();
_documentExcel = excelDocument.CreatePackage(filePath);
}

void InsertXmlTagsToExcelFile(string filePath)
{
CreateExcelFiles(filePath);
var xmlTable = GetAllTablesXMLTags();
// ... insert to _documentExcel
}
like image 532
Borysław Bobulski Avatar asked May 09 '13 10:05

Borysław Bobulski


People also ask

What is OpenXML for Excel?

Open format? Office Open XML (also informally known as OOXML) is a zipped, XML-based file format developed by Microsoft for representing spreadsheets, charts, presentations and word processing documents. Ecma International standardized the initial version as ECMA-376.

What is OpenXML C#?

OpenXML is also known as OOXML and it fully XML-based format for office documents, including word processing documents, spreadsheets, presentations, as well as charts, diagrams, shapes, and other graphical material.


2 Answers

your steps are correct.

I would like to share some sdk documents, hope it could help to some extent:

Open XML SDK 2.5 for Office

When handling the word tables:

Working with WordprocessingML tables (Open XML SDK)

When processing excel tables:

Working with the shared string table (Open XML SDK)

Working with SpreadsheetML tables (Open XML SDK)

like image 79
terry Avatar answered Oct 19 '22 21:10

terry


to get all tables in the docx file you can use code below :

using System;
using Independentsoft.Office;
using Independentsoft.Office.Word;
using Independentsoft.Office.Word.Tables;

namespace Sample
{
    class Program
    {
        static void Main(string[] args)
        {
            WordDocument doc = new WordDocument("c:\\test.docx");

            Table[] tables = doc.GetTables();

            foreach (Table table in tables)
            {
                //read data
            }

        }
    }
}

And to write them into an excel file you have to do this for each cell :

 app.Visible = false;
        workbooks = app.Workbooks;
        workbook =  workbooks.Add(XlWBATemplate.xlWBATWorksheet);
        sheets = workbook.Worksheets;
        worksheet = (_Worksheet)sheets.get_Item(1);
        excel(row, column, "value");
        workbook.Saved = true;
        workbook.SaveAs(output_file);
        app.UserControl = false;
        app.Quit();

and finally excel function is as below :

    public void excel(int row, int column, string value)
    {
        worksheet.Cells[row, column] = value;
    }

Also you can use CSV or HTML format to create an excel file. to do that simply create a file example.xlsx with this content for CSV comma delmiated :

col1,col2,col3,col4 \n

val1,val2,val3val4 \n

or in HTML format :

<table>
 <tr>
  <td>col1</td>
  <td>col2</td>
  <td>col3</td>
 </tr>
 <tr>
  <td>val1</td>
  <td>val2</td>
  <td>val3</td>
 </tr>
</table>
like image 1
Abadis Avatar answered Oct 19 '22 21:10

Abadis