I'm attempting to strip down some XML and get only the value related to a field, however the XML does not use the less than and greater than signs. I try to substring around the field name (in the below case it is Date) and this works fine.
<my:Date xmlns:my="http://schemas.microsoft.com/office/infopath/2003/myXSD/2014-07-27T23:04:34">2014-08-15</my:Date>
However, I am unable to substring around the less than and greater than. My code is as follows:
public string processReportXML(string field, string xml)
{
try
{
string result = xml.Substring(xml.IndexOf(field));
int resultIndex = result.LastIndexOf(field);
if (resultIndex != -1) result = result.Substring(0, resultIndex);
result = result.Substring(result.IndexOf(">"));
resultIndex = result.IndexOf("<");
if (resultIndex != -1) result = result.Substring(0, resultIndex);
return field + ": " + result.Substring(4) + "\n";
}
catch (Exception e)
{
return field + " failed\n";
}
}
I have tried in a test project and it works fine but I always get the index should be greater than 0 in my actual web service. I have also tried using regex to replace the characters but this also didn't work.
result = Regex.Replace(result, "&(?!(amp|apos|quot|lt|gt);)", "hidoesthiswork?");
You have HTML-encoded data.
Add this at the beginning of your method for a simple solution:
xml = HttpUtility.HtmlDecode(xml);
You can also use WebUtility.HtmlDecode
if you're using .NET 4.0+ as in this answer
In the long term, you should really be using an XML parser or something like LINQ-XML to access this data. Regexes are not an appropriate tool for this sort of structured data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With