Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cleaning up RTF text

I'd like to take some RTF input and clean it to remove all RTF formatting except \ul \b \i to paste it into Word with minor format information.

The command used to paste into Word will be something like: oWord.ActiveDocument.ActiveWindow.Selection.PasteAndFormat(0) (with some RTF text already in the Clipboard)

{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}}
{\colortbl ;\red255\green255\blue140;}
\viewkind4\uc1\pard\highlight1\lang3084\f0\fs18 The company is a global leader in responsible tourism and was \ul the first major hotel chain in North America\ulnone  to embrace environmental stewardship within its daily operations\highlight0\par

Do you have any idea on how I can clean up the RTF safely with some regular expressions or something? I am using VB.NET to do the processing but any .NET language sample will do.

like image 412
Vincent Avatar asked Aug 21 '08 16:08

Vincent


2 Answers

I would use a hidden RichTextBox, set the Rtf member, then retrieve the Text member to sanitize the RTF in a well-supported way. Then I would use manually inject the desired formatting afterwards.

like image 182
Nick Avatar answered Nov 09 '22 07:11

Nick


I'd do something like the following:

Dim unformatedtext As String

someRTFtext = Replace(someRTFtext, "\ul", "[ul]")
someRTFtext = Replace(someRTFtext, "\b", "[b]")
someRTFtext = Replace(someRTFtext, "\i", "[i]")

Dim RTFConvert As RichTextBox = New RichTextBox
RTFConvert.Rtf = someRTFtext
unformatedtext = RTFConvert.Text

unformatedtext = Replace(unformatedtext, "[ul]", "\ul")
unformatedtext = Replace(unformatedtext, "[b]", "\b")
unformatedtext = Replace(unformatedtext, "[i]", "\i")

Clipboard.SetText(unformatedtext)

oWord.ActiveDocument.ActiveWindow.Selection.PasteAndFormat(0)
like image 42
Martin Avatar answered Nov 09 '22 06:11

Martin