How can I read word-by-word (with styles) from a docx file. I want to compare two docx files word-by-word and based on the differences I have to write into another docx file (using c# and OOXML). I have tried achieving this by using DocumentFormat.OpenXml.Extensions.dll, OpenXMLdiff.dll and ICSharpCode.SharpZipLib.dll but nothing is giving me the option to read word-by-word(ICSharpCode.SharpZipLib does give word-by-word but it will not give style associated with that word).
Any help on this will be very useful.
Double click the folder you wish to inspect (for example word). Double click the file you wish to inspect (for example document. xml). The document last selected should now appear in an Internet Explorer tab.
docx file is an Open XML formatted Microsoft Word document. Not all applications can read all file format; and in some cases an application may only be able to read parts of the file. For example, a application may be able to read the text, but not the formatting, of a file that uses a format other than its own.
DOCX was originally developed by Microsoft as an XML-based format to replace the proprietary binary format that uses the . doc file extension. Since Word 2007, DOCX has been the default format for the Save operation.
How to open a DOCX file. You can open a DOCX file with Microsoft Word in Windows and macOS. Word is the best option for opening DOCX files because it fully supports the formatting of Word documents, which includes images, charts, tables, and text spacing and alignment. Word is also available for Android and iOS devices ...
This MSDN article shows how to reliably retrieve the exact text of a document, paragraph by paragraph.
http://msdn.microsoft.com/en-us/library/ff686712.aspx
At the same time, you can determine the style for each paragraph. That is pretty easy. The following blog post shows how to retrieve the style and text for each paragraph:
http://blogs.msdn.com/b/ericwhite/archive/2009/02/16/finding-paragraphs-by-style-name-or-content-in-an-open-xml-word-processing-document.aspx
Comparing the two? It depends on your exact desired semantics. One approach would be to create an XML document that contains paragraphs and styles, then comparing the XML documents. The XML document might look something like this:
<Root>
<Para>
<Style>Normal</Style>
<Text>This is the text of the paragraph.</Text>
</Para>
<Para>
<Style>Heading1</Style>
<Text>Overview of the Process</Text>
</Para>
</Root>
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With