Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to tell if a string is xml?

Tags:

c#

xml

We have a string field which can contain XML or plain text. The XML contains no <?xml header, and no root element, i.e. is not well formed.

We need to be able to redact XML data, emptying element and attribute values, leaving just their names, so I need to test if this string is XML before it's redacted.

Currently I'm using this approach:

string redact(string eventDetail)
{
    string detail = eventDetail.Trim();
    if (!detail.StartsWith("<") && !detail.EndsWith(">")) return eventDetail;
    ...

Is there a better way?

Are there any edge cases this approach could miss?

I appreciate I could use XmlDocument.LoadXml and catch XmlException, but this feels like an expensive option, since I already know that a lot of the data will not be in XML.

Here's an example of the XML data, apart from missing a root element (which is omitted to save space, since there will be a lot of data), we can assume it is well formed:

<TableName FirstField="Foo" SecondField="Bar" /> 
<TableName FirstField="Foo" SecondField="Bar" /> 
...

Currently we are only using attribute based values, but we may use elements in the future if the data becomes more complex.

SOLUTION

Based on multiple comments (thanks guys!)

string redact(string eventDetail)
{
    if (string.IsNullOrEmpty(eventDetail)) return eventDetail; //+1 for unit tests :)
    string detail = eventDetail.Trim();
    if (!detail.StartsWith("<") && !detail.EndsWith(">")) return eventDetail;
    XmlDocument xml = new XmlDocument();
    try
    {
        xml.LoadXml(string.Format("<Root>{0}</Root>", detail));
    }
    catch (XmlException e)
    {
        log.WarnFormat("Data NOT redacted. Caught {0} loading eventDetail {1}", e.Message, eventDetail);
        return eventDetail;
    }
    ... // redact
like image 677
si618 Avatar asked Sep 29 '09 00:09

si618


People also ask

Is XML a string?

XML Schema defines in a way what an XML document contains, therefore, XSD defines the string so, it can be defined as a value that contains character strings also has Unicode character given by XML and represented using the type xs: string, while this type has a white space character and maintained by the processor as ...

How check string is XML or not in C#?

Show activity on this post. string xml = ""; XDocument document = XDocument. Parse(xml); And if you don't want to have the ugly try/catch visible, you can throw it into an extension method on the string class...

How do I know if I have XML or JSON?

Very simple: Valid JSON starts always with '{' or '[' Valid XML starts always with '<'


2 Answers

If you're going to accept not well formed XML in the first place, I think catching the exception is the best way to handle it.

like image 155
lod3n Avatar answered Sep 21 '22 14:09

lod3n


One possibility is to mix both solutions. You can use your redact method and try to load it (inside the if). This way, you'll only try to load what is likely to be a well-formed xml, and discard most of the non-xml entries.

like image 29
Samuel Carrijo Avatar answered Sep 21 '22 14:09

Samuel Carrijo