Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to prevent XMLReader from unescaping characters

I'd like to create a simple XMLreader which reads a complete node (including subnodes) as text:

string TXML = @"<xml><text>hall&#xF6;le</text></xml>";

XmlReader r = XmlReader.Create(new StringReader(TXML));
r.Read(); r.Read();

string o = r.ReadOuterXml();

ReadOuterXml does the job but it unescapes the already escaped signs:

"<text>hallöle</text>"

I whish to have the result:

"<text>hall&#xF6;le</text>"

How can I ommit that 'unescaping'. I want to store this fragments to a db and do need that escaping. Furthermore I dont want to parse and recreate the fragments.

like image 766
user1410404 Avatar asked Dec 22 '25 23:12

user1410404


2 Answers

I had a similar problem, I wanted to keep the escaped characters when reading from xml, but in may case when calling ReadOuterXml(), only some of characters were kept and at least oane was transformed (I had " instead of &quot;)

My solution was the following:

string TXML = @"<xml><text>hall&#xF6;le</text></xml>";
TXML = TXML.Replace("&", "&amp;");
XmlTextReader r = new XmlTextReader(new StringReader(TXML));
r.Read(); r.Read();
// now we are at the text element
r.ReadStartElement()
var content = SecurityElement.Escape(r.ReadContentAsString())
r.ReadEndElement()
like image 170
Maria Variu Avatar answered Dec 24 '25 13:12

Maria Variu


I found two solutions. Both not very nice, but maybe you can tell me which has less drawbacks.

Both solutions rely on direcly using the ´XmlTextReader´ instead of ´XmlReader´. It comes with the property ´LinePosition' which lead me to the first solution and with the method ´ReadChars´ as basis for the second one.

Solution (1), get data from original string via indices

Problems:

  • doesn't work on stream inputs
  • doesn't work if xml has several lines

Code

string TXML = @"<xml><data></data><rawnode at=""10 4""><text>hall&#xF6;le</text><z d=""2"">3</z></rawnode><data></data></xml>";

//XmlReader r = XmlReader.Create(new StringReader(TXML));
XmlTextReader r = new XmlTextReader(new StringReader(TXML));

// read to node which shall be retrived "raw"
while ( r.Read() )
{
    if ( r.Name.Equals("rawnode") )
        break;
}

// here we start
int Begin = r.LinePosition;
r.Skip();
int End = r.LinePosition;

// get it out
string output=TXML.Substring(Begin - 2, End - Begin);

Solution (2), get data with ´ReadChars´

Problems:

  • I have to parse and recreate the 'outer' markup of my tag which I'd like to read.
  • This might cost performance.
  • I might introduce errors.

Code:

// ... again create XmlTextReader and read to rawnode, then:
// here we start
int buflen = 15;
char[] buf = new char[buflen];
StringBuilder sb= new StringBuilder("<",20);

//get start tag and attributes    
string tagname=r.Name;
sb.Append(tagname);
bool hasAttributes = r.MoveToFirstAttribute();
while (hasAttributes)
{
    sb.Append(" " + r.Name + @"=""" + r.Value + @"""");
    hasAttributes = r.MoveToNextAttribute();
}
sb.Append(@">");
r.MoveToContent();

//get raw inner data    
int cnt;
while ((cnt = r.ReadChars(buf, 0, buflen)) > 0)
{
    if ( cnt<buflen )
        buf[cnt]=(char)0;
    sb.Append(buf);
}

//append end tag    
sb.Append("</" + tagname + ">");

// get it out
string output = sb.ToString();
like image 27
user1410404 Avatar answered Dec 24 '25 14:12

user1410404



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!