Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Merge multiple word documents into one using OpenXML and XElement

As the title states I am trying to merge multiple word(.docx) files into one word doc. Each of these documents is one page long. I am using some of the code from this post in this implementation. The issue I am running into is that only the first document gets written properly, every other iteration appends a new document but the document contents is the same as the first.

Here is the code I am using:

//list that holds the file paths
List<String> fileNames = new List<string>();
fileNames.Add("filePath");
fileNames.Add("filePath");
fileNames.Add("filePath");
fileNames.Add("filePath");
fileNames.Add("filePath");

//get the first document
MemoryStream mainStream = new MemoryStream();
byte[] buffer = File.ReadAllBytes(fileNames[0]);
mainStream.Write(buffer, 0, buffer.Length);

using (WordprocessingDocument mainDocument = WordprocessingDocument.Open(mainStream, true))
{
    //xml for the new document
    XElement newBody = XElement.Parse(mainDocument.MainDocumentPart.Document.Body.OuterXml);
    //iterate through eacah file
    for (int i = 1; i < fileNames.Count; i++)
    {
        //read in the document
        byte[] tempBuffer = File.ReadAllBytes(fileNames[i]);
        WordprocessingDocument tempDocument = WordprocessingDocument.Open(new MemoryStream(tempBuffer), true);
        //new documents XML
        XElement tempBody = XElement.Parse(tempDocument.MainDocumentPart.Document.Body.OuterXml);
        //add the new xml
        newBody.Add(tempBody);
        string str = newBody.ToString();
        //write to the main document and save
        mainDocument.MainDocumentPart.Document.Body = new Body(newBody.ToString());
        mainDocument.MainDocumentPart.Document.Save();
        mainDocument.Package.Flush();
        tempBuffer = null;
    }
    //write entire stream to new file
    FileStream fileStream = new FileStream("xmltest.docx", FileMode.Create);
    mainStream.WriteTo(fileStream);
    //ret = mainStream.ToArray();
    mainStream.Close();
    mainStream.Dispose();
}

Again the problem is that each new document being created has the same content as the first document. So when I run this the output will be a document with five identical pages. I've tried switching the documents order around in the list and get the same result so it is nothing specific to one document. Could anyone suggest what I am doing wrong here? I'm looking through it and I can't explain the behavior I am seeing. Any suggestions would be appreciated. Thanks much!

Edit: I'm thinking this may have something to do with that fact that the documents I am trying to merge have been generated with custom XML parts. I'm thinking that the Xpath in the documents are somehow pointing to the same content. The thing is I can open each of these documents and see the proper content, it's just when I merge them that I see the issue.

like image 626
TheMethod Avatar asked Jul 23 '12 15:07

TheMethod


People also ask

What is a run in Openxml?

The content of the paragraph is contained in one or more runs (<w:r>). Runs are non-block content; they define regions of text that do not necessarily begin on a new line. Like paragraphs, they are comprised of formatting/property definitions, followed by content.


2 Answers

This solution uses DocumentFormat.OpenXml

public static void Join(params string[] filepaths)
    {

     //filepaths = new[] { "D:\\one.docx", "D:\\two.docx", "D:\\three.docx", "D:\\four.docx", "D:\\five.docx" };
        if (filepaths != null && filepaths.Length > 1)

            using (WordprocessingDocument myDoc = WordprocessingDocument.Open(@filepaths[0], true))
            {
                MainDocumentPart mainPart = myDoc.MainDocumentPart;

                for (int i = 1; i < filepaths.Length; i++)
                {
                    string altChunkId = "AltChunkId" + i;
                    AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
                        AlternativeFormatImportPartType.WordprocessingML, altChunkId);
                    using (FileStream fileStream = File.Open(@filepaths[i], FileMode.Open))
                    {
                        chunk.FeedData(fileStream);
                    }
                    DocumentFormat.OpenXml.Wordprocessing.AltChunk altChunk = new DocumentFormat.OpenXml.Wordprocessing.AltChunk();
                    altChunk.Id = altChunkId;
                    //new page, if you like it...
                        mainPart.Document.Body.AppendChild(new Paragraph(new Run(new Break() { Type = BreakValues.Page })));
                    //next document
                    mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
                }
                mainPart.Document.Save();
                myDoc.Close();
            }
    }
like image 109
Emanuele Greco Avatar answered Oct 13 '22 13:10

Emanuele Greco


The way you seem to merge may not work properly at times. You can try one of the approaches

  1. Using AltChunk as in http://blogs.msdn.com/b/ericwhite/archive/2008/10/27/how-to-use-altchunk-for-document-assembly.aspx

  2. Using http://powertools.codeplex.com/ DocumentBuilder.BuildDocument method

    If still you face the similar issue you can find the databound controls prior to Merge and assign data to these controls from the CustomXml part. You can find this approach in method AssignContentFromCustomXmlPartForDataboundControl of OpenXmlHelper class. The code can be downloaded from http://worddocgenerator.codeplex.com/

like image 31
Atul Verma Avatar answered Oct 13 '22 12:10

Atul Verma