Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Search and replace placeholders split up to multiple <w:t>-Elements

I´m trying to create reports from .docx-Templates using the Open XML SDK 2.5. Within the templates I have defined some placeholders that get replaced by real values. The placeholders can be defined in various schemas such as

<#Name#>
or
<!#Name#!>
or
#Name#
or
{{Name}}

The schema of the placeholder can also be in another format, as long as the placeholders can be clearly identified within the text.

The problem I am currently facing is that a placeholder is often split among multiple <w:t>-Elements (DocumentFormat.OpenXml.Wordprocessing.Text) within an <w:p>-Element (DocumentFormat.OpenXml.Wordprocessing.Paragraph). An example

<w:p w:rsidR="003137E0" w:rsidRDefault="008C62F1" w:rsidP="00D43D55">
  <w:r>
    <w:t xml:space="preserve">#FirstName# </w:t>
  </w:r>
  <w:r w:rsidR="00C93A70">
    <w:t>#LastName</w:t>
  </w:r>
  <w:r w:rsidR="005F49B7">
    <w:t>#</w:t>
  </w:r>
</w:p>

Here the placeholder #FirstName# is easily identifyable, cause it is within one <w:t>-Element, but the placeholder #LastName# is split among multiple <w:t>-Elements, so that I cannot use a simple Regex on the Text on the Document like

Regex placeholderRegex = new Regex(@"#[\w]*#");

document.MainDocumentPart.Document.Body.Descendants<Text>().Where(t=> placeholderRegex.IsMatch(t.Text))

I have no control how the templates get defined and I also will not put constraints on the Users how they have to create the template. For me it is also not clear when a placeholder gets split into multiple <w:t>-Elements.

Another example using {{[\w]*}} as schema for placeholders.

Text (Docx)

{{Ort}}
And this {{placeholder}} is within the text 

Xml (OpenXML)

<w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/wordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships" xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w14="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="http://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xmlns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingInk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessingShape" mc:Ignorable="w14 wp14">
  <w:body>
    <w:p w:rsidR="007B60F2" w:rsidRDefault="00BB7370" w:rsidP="00D43D55">
      <w:pPr>
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
      </w:pPr>
      <w:r w:rsidRPr="00114EA7">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>{{</w:t>
      </w:r>
      <w:r w:rsidR="00C93A70" w:rsidRPr="00114EA7">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>Ort</w:t>
      </w:r>
      <w:r w:rsidR="00114EA7" w:rsidRPr="00114EA7">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>}}</w:t>
      </w:r>
    </w:p>
    <w:p w:rsidR="00EC3BED" w:rsidRPr="00114EA7" w:rsidRDefault="00C310E0" w:rsidP="00D43D55">
      <w:pPr>
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
      </w:pPr>
      <w:r w:rsidRPr="00114EA7">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t xml:space="preserve">This is a text with a </w:t>
      </w:r>
      <w:r w:rsidR="00A07A5D">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>{{</w:t>
      </w:r>
      <w:r w:rsidRPr="00114EA7">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>placeholder</w:t>
      </w:r>
      <w:r w:rsidR="00A07A5D">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>}</w:t>
      </w:r>
      <w:r w:rsidR="00114EA7" w:rsidRPr="00114EA7">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>}</w:t>
      </w:r>
      <w:bookmarkStart w:id="0" w:name="_GoBack" />
      <w:bookmarkEnd w:id="0" />
      <w:r w:rsidR="00A07A5D">
        <w:rPr>
          <w:lang w:val="en-US" />
        </w:rPr>
        <w:t>.</w:t>
      </w:r>
    </w:p>
    <w:sectPr w:rsidR="00EC3BED" w:rsidRPr="00114EA7" w:rsidSect="00237721">
      <w:pgSz w:w="11906" w:h="16838" />
      <w:pgMar w:top="1417" w:right="1417" w:bottom="1134" w:left="1417" w:header="708" w:footer="708" w:gutter="0" />
      <w:cols w:space="708" />
      <w:docGrid w:linePitch="360" />
    </w:sectPr>
  </w:body>
</w:document>

So my question is now whats the way to search and replace placeholders with values using Open XML SDK? Is there some functionality within the SDK that can help me? Has anybody else solved this problem and provide assistance?

like image 267
Jehof Avatar asked Apr 11 '14 07:04

Jehof


3 Answers

Please see docx4j does not replace variables for a link to Java source code which solves the problem.

You could implement something similar in C#, or use that code via http://www.nuget.org/packages/docx4j.NET/3.0.1

like image 69
JasonPlutext Avatar answered Oct 21 '22 13:10

JasonPlutext


I would do this with something like this (not tested but I think this will help you):

List placeHolders = new List();

//load xml string
var doc = XDocument.Parse(xml);
//or to load from file use XDocument.Load("path_to_xml_file.xml");

//get all <w:p> element
var wpElements = doc.Root.Elements("w:p");

foreach (var wp in wpElements)
{
    var wrElements = wp.Descendants("w:r");
    foreach (var wr in wrElements)
    {
        var wt = (string)wr.Element("w:t");
        if (wt.IsMatch(@"\w")) { //add the string to placeHolders if word is found 
            placeHolders.Add(wt);
        }
        else
        {
            //if not found a word, add it to the last placeHolder, 
            placeHolder[placeHolder.Count - 1] = placeHolder[placeHolder.Count - 1] + wt;
        }
    }
}
like image 3
jomsk1e Avatar answered Oct 21 '22 12:10

jomsk1e


Yes, MS Word application splits even single word into multiple Run/Text elements (for some reason). And no, there is no Find/Replace functionality provided within the Open XML SDK functionality. But you can create your own for the simplest Paragraph/Run/Text structure. You will need to:

  1. Create a map for all words with the information what corresponding Run/Text elements each words consists of.
  2. Then you will scouring the index for particular words (e.g. <#Name#>) and replace content of the first Run/Text element and remove all the other ones, except the last one, which might be part of the next word. In this case you will have to fix it so it will only not include part of this word any more.
like image 2
Trifun Avatar answered Oct 21 '22 13:10

Trifun