Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Non-valid Unicode/XML with Python SimpleXMLRPCServer?

I am getting the following error on the client side when I pass invalid XML characters to a Python SimpleXMLRPCServer:

Fault: <Fault 1: "<class 'xml.parsers.expat.ExpatError'>:not well-formed (invalid token): line 6, column 15">

Why? Do I have to change the SimpleXMLRPCServer library code to fix this?

Here is my XML-RPC server code:

from SimpleXMLRPCServer import SimpleXMLRPCServer

import logging
logging.basicConfig(level=logging.DEBUG)

def tt(text):
    return "cool"

server = SimpleXMLRPCServer(("0.0.0.0", 9000))
server.register_introspection_functions()
server.register_function(tt)

# Run the server's main loop
server.serve_forever()

Here is my XML-RPC client code:

s = xmlrpclib.ServerProxy('http://localhost:9000')
s.tt(unichr(0x8))

On the server side, I don't get ANY error or traceback:

liXXXXXX.members.linode.com - - [06/Dec/2010 23:19:40] "POST /RPC2 HTTP/1.0" 200 -

Why no error on the server side? How do I diagnose what is going on?

And I get the following traceback on the client side:

/usr/lib/python2.6/xmlrpclib.pyc in __call__(self, *args)
   1197         return _Method(self.__send, "%s.%s" % (self.__name, name))
   1198     def __call__(self, *args):
-> 1199         return self.__send(self.__name, args)
   1200 
   1201 ##


/usr/lib/python2.6/xmlrpclib.pyc in __request(self, methodname, params)
   1487             self.__handler,
   1488             request,
-> 1489             verbose=self.__verbose
   1490             )
   1491 

/usr/lib/python2.6/xmlrpclib.pyc in request(self, host, handler, request_body, verbose)
   1251             sock = None
   1252 
-> 1253         return self._parse_response(h.getfile(), sock)
   1254 
   1255     ##


/usr/lib/python2.6/xmlrpclib.pyc in _parse_response(self, file, sock)
   1390         p.close()
   1391 
-> 1392         return u.close()
   1393 
   1394 ##


/usr/lib/python2.6/xmlrpclib.pyc in close(self)
    836             raise ResponseError()
    837         if self._type == "fault":
--> 838             raise Fault(**self._stack[0])
    839         return tuple(self._stack)
    840 

Fault: <Fault 1: "<class 'xml.parsers.expat.ExpatError'>:not well-formed (invalid token): line 6, column 15">

How do I get sane server-side processing if the input contains invalid XML? Can I clean up this data server side? How?

like image 294
Joseph Turian Avatar asked Dec 07 '10 04:12

Joseph Turian


1 Answers

First, your example doesn't work for me, either. I don't know what you're asking about "sane server-side processing if the input contains invalid XML" -- you send the server invalid XML, and it is giving you back an error... what more do you want?

Second, stick a print 'hi there' in tt, you will see that tt is not being called when you send unichr(0x8). The exact response (a 200) by the server is:

HTTP/1.0 200 OK
Server: BaseHTTP/0.3 Python/2.6.5
Date: Tue, 07 Dec 2010 07:33:09 GMT
Content-type: text/xml
Content-length: 350

<?xml version='1.0'?>
<methodResponse>
<fault>
<value><struct>
<member>
<name>faultCode</name>
<value><int>1</int></value>
</member>
<member>
<name>faultString</name>
<value><string>&lt;class 'xml.parsers.expat.ExpatError'&gt;:not well-formed (invalid token): line 6, column 15</string></value>
</member>
</struct></value>
</fault>
</methodResponse>

So, you see your error message.

Now, according to the XML-RPC spec,

  • What characters are allowed in strings? Non-printable characters? Null characters? Can a "string" be used to hold an arbitrary chunk of binary data?

Any characters are allowed in a string except < and &, which are encoded as &lt; and &amp;. A string can be used to encode binary data.

Ok, but this is XML, and according to the XML spec:

Legal characters are tab, carriage return, line feed, and the legal characters of Unicode and ISO/IEC 10646.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]

Which doesn't include 0x08, and seems to completely contradict the XML-RPC spec! So, it would see that the XML spec is being implemented fairly rigorously by your XML parser (which, judging from the error, looks to be expat). Since XML doesn't allow 0x08, you can't send 0x08, and indeed, you get an error back.

If we do:

data = "<?xml version='1.0'?>\n<methodCall>\n<methodName>tt</methodName>\n<params>\n<param>\n<value><string>\x08</string></value>\n</param>\n</params>\n</methodCall>"
p = xml.parsers.expat.ParserCreate()
p.Parse(data, True)

...we get your error. Again, you are passing garbage XML to the server, the server is passing you back an error message, and the Python in the middle is presenting that error to you as an exception. What behavior did you expect?

like image 110
Thanatos Avatar answered Nov 19 '22 11:11

Thanatos