Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode special characters in XML

Tags:

c#

xml

My string XML contains a whole series of special characters:

&
egrave;
&
rsquo;
&
rsquo;
&
rsquo;
&
ldquo;
&
rdquo;
&
rsquo
&
agrave;
&
agrave;

I need replace this special characters in insert string in DB and I tried use System.Net.WebUtility.HtmlEncode without success, can you help me?

string sql = "insert into rss (title, description, link, pubdate) values (?,?,?, " +
             " STR_TO_DATE(?, '%a, %d %b %Y %H:%i:%s GMT'));";

OdbcCommand command;
OdbcDataAdapter adpter = new OdbcDataAdapter();
connection.Open();
command = new OdbcCommand(sql, connection);
command.Parameters.AddWithValue("param1", System.Net.WebUtility.HtmlEncode(xmlTitle.InnerText.ToString()));
command.Parameters.AddWithValue("param2", System.Net.WebUtility.HtmlEncode(xmlDescription.InnerText.ToString()));
command.Parameters.AddWithValue("param3", System.Net.WebUtility.HtmlEncode(xmlLink.InnerText.ToString()));
command.Parameters.AddWithValue("param4", System.Net.WebUtility.HtmlEncode(xmlPubDate.InnerText.ToString()));
adpter.InsertCommand = command;
adpter.InsertCommand.ExecuteNonQuery();
connection.Close();
like image 405
Hamamelis Avatar asked Apr 07 '14 08:04

Hamamelis


People also ask

How do you write special characters in XML?

The special characters can be referenced in XML using one of 3 formats: &name; where name is the character name (if available) such as quot, amp, apos, lt, or gt. &#nn; where nn is the decimal character code reference. &#xhh; where xhh is the hexadecimal character code reference.

How do I change special characters in XML?

Special characters (such as <, >, &, ", and ' ) can be replaced in XML documents with their html entities using the DocumentKeywordReplace service. However, since html entities used within BPML are converted to the appropriate character, the string mode of DocumentKeywordReplace will not work in this instance.

What is &# 10 in XML?

The unicode is &#10; and it's being used in an XML document. That's not unicode, it's a numeric character entity.

Is Unicode allowed in XML?

All results are returned in XML documents. XML does not support certain Unicode characters (the NUL character, anything in XML's RestrictedChar category, and permanently undefined Unicode characters). However, you can accidentally send them through the REST API.


2 Answers

You can use a native .NET method for escaping special characters in text. Sure, there's only like 5 special characters, and 5 Replace() calls would probably do the trick, but I'm sure there's got to be something built-in.

Example of converting "&" to "&amp;"

To much relief, I've discovered a native method, hidden away in the bowels of the SecurityElement class. Yes, that's right - SecurityElement.Escape(string s) will escape your string and make it XML safe.

This is important, since if we are copying or writing data to Infopath Text fields, it needs to be first Escaped to non-Entity character like "&amp;".

invalid XML Character to Replaced With

"<" to "&lt;"

">" to "&gt;"

"\"" to "&quot;"

"'" to "&apos;"

"&" to "&amp;"

Namespace is "System.Security". Refer : http://msdn2.microsoft.com/en-us/library/system.security.securityelement.escape(VS.80).aspx

The Other Option is to Customise code for

public static string EscapeXml( this string s )
{
  string toxml = s;
  if ( !string.IsNullOrEmpty( toxml ) )
  {
    // replace literal values with entities
    toxml = toxml.Replace( "&", "&amp;" );
    toxml = toxml.Replace( "'", "&apos;" );
    toxml = toxml.Replace( "\"", "&quot;" );
    toxml = toxml.Replace( ">", "&gt;" );
    toxml = toxml.Replace( "<", "&lt;" );
  }
  return toxml;
}

public static string UnescapeXml( this string s )
{
  string unxml = s;
  if ( !string.IsNullOrEmpty( unxml ) )
  {
    // replace entities with literal values
    unxml = unxml.Replace( "&apos;", "'" );
    unxml = unxml.Replace( "&quot;", "\"" );
    unxml = unxml.Replace( "&gt;", ">" );
    unxml = unxml.Replace( "&lt;", "<" );
    unxml = unxml.Replace( "&amp;", "&" );
  }
  return unxml;
}
like image 152
Dmytro Khmara Avatar answered Sep 19 '22 12:09

Dmytro Khmara


You can use HttpUtility.HtmlDecode or with .NET 4.0+ you can also use WebUtility.HtmlDecode

like image 36
Dallas Avatar answered Sep 17 '22 12:09

Dallas