UPDATE May this post be helpful for coders using RichTextBoxes. The Match is correct for a normal string, I did not see this AND I did not see that "ä" transforms to "\e4r" in the richTextBox.Rtf! So the Match.Value is correct - human error.
A RegEx finds the correct text but Match.Value is wrong because it replaces the german "ä" with "\'e4"!
Let example_text = "Primär-ABC" and lets use the following code
String example_text = "<em>Primär-ABC</em>";
Regex em = new Regex(@"<em>[^<]*</em>" );
Match emMatch = em.Match(example_text); //Works!
Match emMatch = em.Match(richtextBox.RTF); //Fails!
while (emMatch.Success)
{
string matchValue = emMatch.Value;
Foo(matchValue) ...
}
then the emMatch.Value returns "Prim\'e4r-ABC" instead of "Primär-ABC".
The German ä transforms to \'e4! Because I want to work with the exact string, i would need emMatch.Value to be Primär-ABC - how do I achieve that?
In what context are you doing this?
string example_text = "<em>Ich bin ein Bärliner</em>";
Regex em = new Regex(@"<em>[^<]*</em>" );
Match emMatch = em.Match(example_text);
while (emMatch.Success)
{
Console.WriteLine(emMatch.Value);
emMatch = emMatch.NextMatch();
}
This outputs <em>Ich bin ein Bärliner</em>
in my console
The problem probably isn't that you're getting the wrong value back, it's that you're getting a representation of the value that isn't displayed correctly. This can depend on a lot of things. Try writing the value to a text file using UTF8 encoding and see if it still is incorrect.
Edit: Right. The thing is that you are getting the text from a WinForms RichTextBox
using the Rtf
property. This will not return the text as is, but will return the RTF representation of the text. RTF is not plain text, it's a markup format to display rich text. If you open an RTF document in e.g. Notepad you will see that it has a lot of weird codes in it - including \'e4
for every 'ä' in your RTF document. If you would've used some markup (like bold text, color etc) in the RTF box, the .Rtf
property would return that code as well, looking something like {\rtlch\fcs1 \af31507 \ltrch\fcs0 \cf6\insrsid15946317\charrsid15946317 test}
So use the .Text
property instead. It will return the actual plain text.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With