I need to use C# programatically to append several preexisting docx
files into a single, long docx
file - including special markups like bullets and images. Header and footer information will be stripped out, so those won't be around to cause any problems.
I can find plenty of information about manipulating an individual docx
file with .NET Framework 3, but nothing easy or obvious about how you would merge files. There is also a third-party program (Acronis.Words) that will do it, but it is prohibitively expensive.
Automating through Word has been suggested, but my code is going to be running on ASP.NET on an IIS web server, so going out to Word is not an option for me. Sorry for not mentioning that in the first place.
In spite of all good suggestions and solutions submitted, I developed an alternative. In my opinion you should avoid using Word in server applications entirely. So I worked with OpenXML, but it did not work with AltChunk. I added text to original body, I receive a List of byte[] instead a List of file names but you can easily change the code to your needs.
using System; using System.Collections.Generic; using System.Globalization; using System.IO; using System.Xml.Linq; using DocumentFormat.OpenXml.Packaging; using DocumentFormat.OpenXml.Wordprocessing; namespace OfficeMergeControl { public class CombineDocs { public byte[] OpenAndCombine( IList<byte[]> documents ) { MemoryStream mainStream = new MemoryStream(); mainStream.Write(documents[0], 0, documents[0].Length); mainStream.Position = 0; int pointer = 1; byte[] ret; try { using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true)) { XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml); for (pointer = 1; pointer < documents.Count; pointer++) { WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(documents[pointer]), true); XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml); newBody.Add(tempBody); mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString()); mainDocument.MainDocumentPart.Document.Save(); mainDocument.Package.Flush(); } } } catch (OpenXmlPackageException oxmle) { throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), oxmle); } catch (Exception e) { throw new OfficeMergeControlException(string.Format(CultureInfo.CurrentCulture, "Error while merging files. Document index {0}", pointer), e); } finally { ret = mainStream.ToArray(); mainStream.Close(); mainStream.Dispose(); } return (ret); } } }
I hope this helps you.
You don't need to use automation. DOCX files are based on the OpenXML Formats. They are just zip files with a bunch of XML and binary parts (think files) inside. You can open them with the Packaging API (System.IO.Packaging in WindowsBase.dll) and manipulate them with any of the XML classes in the Framework.
Check out OpenXMLDeveloper.org for details.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With