Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to convert a string to RTF in C#?

Tags:

c#

rtf

Question

How do I convert the string "Européen" to the RTF-formatted string "Europ\'e9en"?

[TestMethod]
public void Convert_A_Word_To_Rtf()
{
    // Arrange
    string word = "Européen";
    string expected = "Europ\'e9en";
    string actual = string.Empty;

    // Act
    // actual = ... // How?

    // Assert
    Assert.AreEqual(expected, actual);
}

What I have found so far

RichTextBox

RichTextBox can be used for certain things. Example:

RichTextBox richTextBox = new RichTextBox();
richTextBox.Text = "Européen";
string rtfFormattedString = richTextBox.Rtf;

But then rtfFormattedString turns out to be the entire RTF-formatted document, not just the string "Europ\'e9en".

Stackoverflow

  • Insert string with special characters into RTF
  • How to output unicode string to RTF (using C#)
  • Output RTF special characters to Unicode
  • Convert Special Characters for RTF (iPhone)

Google

I've also found a bunch of other resources on the web, but nothing quite solved my problem.

Answer

Brad Christie's answer

Had to add Trim() to remove the preceeding space in result. Other than that, Brad Christie's solution seems to work.

I'll run with this solution for now even though I have a bad gut feeling since we have to SubString and Trim the heck out of RichTextBox to get a RTF-formatted string.

Test case:

[TestMethod]
public void Test_To_Verify_Brad_Christies_Stackoverflow_Answer()
{
        Assert.AreEqual(@"Europ\'e9en", "Européen".ConvertToRtf());
        Assert.AreEqual(@"d\'e9finitif", "définitif".ConvertToRtf());
        Assert.AreEqual(@"\'e0", "à".ConvertToRtf());
        Assert.AreEqual(@"H\'e4user", "Häuser".ConvertToRtf());
        Assert.AreEqual(@"T\'fcren", "Türen".ConvertToRtf());
        Assert.AreEqual(@"B\'f6den", "Böden".ConvertToRtf());
}

Logic as an extension method:

public static class StringExtensions
{
    public static string ConvertToRtf(this string value)
    {
        RichTextBox richTextBox = new RichTextBox();
        richTextBox.Text = value;
        int offset = richTextBox.Rtf.IndexOf(@"\f0\fs17") + 8; // offset = 118;
        int len = richTextBox.Rtf.LastIndexOf(@"\par") - offset;
        string result = richTextBox.Rtf.Substring(offset, len).Trim();
        return result;
    }
}
like image 780
Lernkurve Avatar asked Jan 25 '11 15:01

Lernkurve


People also ask

What is RTF string?

Rich Text Format (RTF) is a text formatting language devised by Microsoft Corporation. You can represent character, paragraph, and document format attributes using plain text with interspersed RTF commands, groups, and escape sequences.

Can you convert HTML to RTF?

The task of converting web pages from HTML to RTF can be accomplished with conversion software or by manually typing code into a web page editing program. The task of converting web pages from HTML to RTF can be accomplished with conversion software or by manually typing code into a web page editing program.


2 Answers

Doesn't RichTextBox always have the same header/footer? You could just read the content based on off-set location, and continue using it to parse. (I think? please correct me if I'm wrong)

There are libraries available, but I've never had good luck with them personally (though always just found another method before fully exhausting the possibilities). In addition, most of the better ones are usually include a nominal fee.


EDIT
Kind of a hack, but this should get you through what you need to get through (I hope):

RichTextBox rich = new RichTextBox();
Console.Write(rich.Rtf);

String[] words = { "Européen", "Apple", "Carrot", "Touché", "Résumé", "A Européen eating an apple while writing his Résumé, Touché!" };
foreach (String word in words)
{
    rich.Text = word;
    Int32 offset = rich.Rtf.IndexOf(@"\f0\fs17") + 8;
    Int32 len = rich.Rtf.LastIndexOf(@"\par") - offset;
    Console.WriteLine("{0,-15} : {1}", word, rich.Rtf.Substring(offset, len).Trim());
}

EDIT 2

The breakdown of the codes RTF control code are as follows:

  • Header
    • \f0 - Use the 0-index font (first font in the list, which is typically Microsoft Sans Serif (noted in the font table in the header: {\fonttbl{\f0\fnil\fcharset0 Microsoft Sans Serif;}}))
    • \fs17 - Font formatting, specify the size is 17 (17 being in half-points)
  • Footer
    • \par is specifying that it's the end of a paragraph.

Hopefully that clears some things up. ;-)

like image 71
Brad Christie Avatar answered Sep 20 '22 13:09

Brad Christie


I found a nice solution that actually uses the RichTextBox itself to do the conversion:

private static string FormatAsRTF(string DirtyText)
{
    System.Windows.Forms.RichTextBox rtf = new System.Windows.Forms.RichTextBox();
    rtf.Text = DirtyText;
    return rtf.Rtf;
}

http://www.baltimoreconsulting.com/blog/development/easily-convert-a-string-to-rtf-in-net/

like image 28
Matthew Lock Avatar answered Sep 20 '22 13:09

Matthew Lock