Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Handling illegal XML values while reading emails with EWS

We have an application that uses a StreamingSubscriptionConnection to read every email that gets sent to a particular mailbox. The issue I'm running into several times a day during development I get the exception {"'{square character}', hexadecimal value 0x1F, is an invalid character. Line 1, position 1."}.

Here is the stack trace:

   at System.Xml.XmlTextReaderImpl.Throw(Exception e)
   at System.Xml.XmlTextReaderImpl.Throw(String res, String[] args)
   at System.Xml.XmlTextReaderImpl.Throw(Int32 pos, String res, String[] args)
   at System.Xml.XmlTextReaderImpl.ThrowInvalidChar(Int32 pos, Char invChar)
   at System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars)
   at System.Xml.XmlTextReaderImpl.ParseText()
   at System.Xml.XmlTextReaderImpl.ParseDocumentContent()
   at System.Xml.XmlTextReaderImpl.Read()
   at Microsoft.Exchange.WebServices.Data.EwsXmlReader.Read()
   at Microsoft.Exchange.WebServices.Data.EwsXmlReader.Read(XmlNodeType nodeType)
   at Microsoft.Exchange.WebServices.Data.EwsXmlReader.InternalReadElement(XmlNamespace xmlNamespace, String localName, XmlNodeType nodeType)
   at Microsoft.Exchange.WebServices.Data.EwsXmlReader.ReadStartElement(XmlNamespace xmlNamespace, String localName)
   at Microsoft.Exchange.WebServices.Data.ServiceRequestBase.ReadResponse(EwsServiceXmlReader ewsXmlReader)

How can I safely read emails with EWS that contain illegal characters?

After much searching it appears it was possible to fix this issue with older versions of the EWS API. However, with the newest version of the managed API no one seems to have a fix.

This is a cross post from http://social.technet.microsoft.com/Forums/en-US/exchangesvrdevelopment/thread/22863099-1d93-47ac-a11b-08c6bf7facea .

I've managed to get the exception again and here is the full stacktrace and what Exchange is turning as a notification.

I'm using Exchange 2010 SP1.

EWS Notification Exception

Edit: I'm reviving this question, as it is causing me serious problems and the original question states the problem clearly. I am looking for client-side solutions that modify the behavior of the Managed EWS API to filter invalid characters from the XML and avoid exceptions. Exchange server fixes are unlikely to be an option, unless they are simple configuration changes. My software will be run against customer Exchange servers I do not control.

like image 464
gcso Avatar asked Jul 27 '11 12:07

gcso


1 Answers

A general workaround would be, to write a utility program that watches the inbox folder(s) as a background process, if you can not solve the issue of receiving invalid XML files:

That program would do the following:

  • If a new file has been created and was written completely (wait for close), open the file (exclusively), then search and replace each non-valid XML character with it's XML escape sequence: &...;

You should be able to safely replace all non-printable characters below US-ASCII 32 with such an escape sequence, with the exception of '\r', '\n' and '\t'. However, you would also have to make sure that you never damage the XML files and that the changed XML files can still be used, by whatever system uses them.

Or look for one of the more common XML sanitizer libraries.

like image 157
Sascha Wedler Avatar answered Oct 28 '22 00:10

Sascha Wedler