I'd like to take some RTF input and clean it to remove all RTF formatting except \ul \b \i to paste it into Word with minor format information.
The command used to paste into Word will be something like: oWord.ActiveDocument.ActiveWindow.Selection.PasteAndFormat(0) (with some RTF text already in the Clipboard)
{\rtf1\ansi\deff0{\fonttbl{\f0\fnil\fcharset0 Courier New;}}
{\colortbl ;\red255\green255\blue140;}
\viewkind4\uc1\pard\highlight1\lang3084\f0\fs18 The company is a global leader in responsible tourism and was \ul the first major hotel chain in North America\ulnone to embrace environmental stewardship within its daily operations\highlight0\par
Do you have any idea on how I can clean up the RTF safely with some regular expressions or something? I am using VB.NET to do the processing but any .NET language sample will do.
I would use a hidden RichTextBox, set the Rtf member, then retrieve the Text member to sanitize the RTF in a well-supported way. Then I would use manually inject the desired formatting afterwards.
I'd do something like the following:
Dim unformatedtext As String
someRTFtext = Replace(someRTFtext, "\ul", "[ul]")
someRTFtext = Replace(someRTFtext, "\b", "[b]")
someRTFtext = Replace(someRTFtext, "\i", "[i]")
Dim RTFConvert As RichTextBox = New RichTextBox
RTFConvert.Rtf = someRTFtext
unformatedtext = RTFConvert.Text
unformatedtext = Replace(unformatedtext, "[ul]", "\ul")
unformatedtext = Replace(unformatedtext, "[b]", "\b")
unformatedtext = Replace(unformatedtext, "[i]", "\i")
Clipboard.SetText(unformatedtext)
oWord.ActiveDocument.ActiveWindow.Selection.PasteAndFormat(0)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With