Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to replace an Paragraph's text using OpenXML Sdk

I am parsing some Openxml word documents using the .Net OpenXml SDK 2.0. I need to replace certain sentences with other sentences as part of the processing. While iterating over the paragraphs, I know when I've found something I need to replace, but I am stumped as to how I can replace it.

For example, lets say I need to replace the sentence "a contract exclusively for construction work that is not building work." with a html snippet to a Sharepoint Reusable content below.

<span class="ms-rtestate-read ms-reusableTextView" contentEditable="false" id="__publishingReusableFragment" fragmentid="/Sites/Sandbox/ReusableContent/132_.000" >a contract exclusively for construction work that is not building work.</span>

PS: I got the docx to Html conversion worked out using xslt, so that is kind of not a problem at this stage

The InnerText property of the Paragraph node gives me the proper text, but the inner text property itself is not settable. So Regex.Match(currentParagraph.InnerText, currentString).Success returns true and tells me that the current paragraph contains the text I want.

As I said, InnerText itself is not settable, so I tried created a new paragraph using outerxml is given below.

string modifiedOuterxml = Regex.Replace(currentParagraph.OuterXml, currentString, reusableContentString);
OpenXmlElement parent = currentParagraph.Parent;
Paragraph modifiedParagraph = new Paragraph(modifiedOuterxml);
parent.ReplaceChild<Paragraph>(modifiedParagraph, currentParagraph);

Even though I am not too concerned about the formatting at this level and it doesn't seem to have any, the outerXML seems to have extra elements that defeat the regex.

..."16" /><w:lang w:val="en-AU" /></w:rPr><w:t>a</w:t></w:r><w:proofErr w:type="gramEnd" /> <w:r w:rsidRPr="00C73B58"><w:rPr><w:sz w:val="16" /><w:szCs w:val="16" /><w:lang w:val="en-AU" /></w:rPr><w:t xml:space="preserve"> contract exclusively for construction work that is not building work.</w:t></w:r></w:p>

So in summary, how would I replace the text in a Paragraph of OpenXml with other text. Even at the expense of losing some of the formatting.

like image 684
Chaitanya Avatar asked Nov 25 '10 10:11

Chaitanya


People also ask

What is Documentformat Openxml?

The Open XML SDK provides tools for working with Office Word, Excel, and PowerPoint documents. It supports scenarios such as: - High-performance generation of word-processing documents, spreadsheets, and presentations. - Populating content in Word files from an XML data source.

How do I make text bold in Openxml?

You need to use separate Run elements for the differently styled pieces of text. You can add the bold by creating a RunProperties element and adding a Bold element to that.

What is a run in Openxml?

A run defines a non-block region of text with a common set of properties. It is specified with the <w:r> element. The properties of the run are specified with the <w:rPr> element, which is the first element of the <w:r>.

Is Openxml open source?

Today MS Open Tech has announced the release of the Open XML SDK version 2.5 as open source software (Apache 2.0 license) under the stewardship of the . NET Foundation.


2 Answers

Fixed it myself. The key was to remove all the runs and create new runs in the current paragraph

string modifiedString = Regex.Replace(currentParagraph.InnerText, currentString, reusableContentString);
currentParagraph.RemoveAllChildren<Run>();
currentParagraph.AppendChild<Run>(new Run(new Text(modifiedString)));
like image 139
Chaitanya Avatar answered Nov 01 '22 23:11

Chaitanya


All paragraphs have a text element inside so you just have to find the text element and update its text, for example:

var text = part.RootElement.Descendants<Text>().FirstOrDefault(e=>e.Text == "a contract exclusively for construction work that is not building work.");
if(text != null)
{
    text.Text = "New text here";
}
mainPart.Document.Save();
like image 38
Nick Hoàng Avatar answered Nov 01 '22 23:11

Nick Hoàng