I would like to upload a Word 2007 or greater docx file to my web server and convert the table of contents to a simple xml structure. Doing this on the desktop with traditional VBA seems like it would have been easy. Looking at the WordprocessingML XML data used to create the docx file is confusing. Is there a way (without COM) to navigate the document in more of an object-oriented fashion?
You can upload and download files with the Google Docs app for Android. Import: You can open and edit DOC, DOCX, ODT, TXT, RTF, and HTML files.
If you are using Microsoft Office Word 2007 or Word 2010, you can open . docx or . docm files that were created in Word 2016 and 2013.
I highly recommend looking into the Open XML SDK 2.0. It's a CTP, but I've found it extremely useful in manipulating xmlx files without having to deal with COM at all. The documentation is a bit sketchy, but the key thing to look for is the DocumentFormat.OpenXml.Packaging.WordprocessingDocument class. You can pick apart the .docx document if you rename the extension to .zip and dig into the XML files there. From doing that, it looks like a Table of Contents is contained in a "Structured Document" tag and that things like the headings are in a hyperlink from there. Putzing around with it a bit, I found that something like this should work (or at least give you a starting point).
WordprocessingDocument wordDoc = WordprocessingDocument.Open(Filename, false);
SdtBlock contents = wordDoc.MainDocumentPart.Document.Descendants<SdtBlock>().First();
List<string> contentList = new List<string>();
foreach (Hyperlink section in contents.Descendants<Hyperlink>())
{
contentList.Add(section.Descendants<Text>().First().Text);
}
Here is a blog post on querying Open XML WordprocessingML documents using LINQ to XML. Using that code, you can write a query as follows:
using (WordprocessingDocument doc =
WordprocessingDocument.Open(filename, false))
{
foreach (var p in doc.MainDocumentPart.Paragraphs())
{
Console.WriteLine("Style: {0} Text: >{1}<",
p.StyleName.PadRight(16), p.Text);
foreach (var c in p.Comments())
Console.WriteLine(
" Comment Author:{0} Text:>{1}<",
c.Author, c.Text);
}
}
Blog post: Open XML SDK and LINQ to XML
-Eric
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With