Python standard library provides xml.sax.xmlreader.IncrementalParser
interface which has feed()
method. Jython also provides xml.sax
package that uses Java SAX parser implementation under the hood, but it seems not to provide IncrementalParser
.
Is there any way to incrementally parse chunks of XML in Jython? At the first glance I thought it can be achieved using coroutine like greenlet
, but I immediately realized that it can’t be used in Jython.
You can use StAX
. The StAX
parser streams like SAX
but maintains a cursor and allows you to extract content at the cursor by using hasNext()
and next()
.
The following code is adapted from this java example. Note this is my first attempt ever with jython, so don't hang me if I did something unconventionally, but the example works.
http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html
from javax.xml.stream import XMLStreamConstants, XMLInputFactory, XMLStreamReader
from java.io import ByteArrayInputStream;
from java.lang import String
xml = String(
"""<?xml version="1.0" encoding="ISO-8859-1"?>
<employees>
<employee id="111">
<firstName>Rakesh</firstName>
<lastName>Mishra</lastName>
<location>Bangalore</location>
</employee>
<employee id="112">
<firstName>John</firstName>
<lastName>Davis</lastName>
<location>Chennai</location>
</employee>
<employee id="113">
<firstName>Rajesh</firstName>
<lastName>Sharma</lastName>
<location>Pune</location>
</employee>
</employees>
""")
class Employee:
id = None
firstName = None
lastName = None
location = None
def __str__(self):
return self.firstName + " " + self.lastName + "(" + self.id + ") " + self.location
factory = XMLInputFactory.newInstance();
reader = factory.createXMLStreamReader(ByteArrayInputStream(xml.getBytes()))
employees = []
employee = None
tagContent = None
while reader.hasNext():
event = reader.next();
if event == XMLStreamConstants.START_ELEMENT:
if "employee" == reader.getLocalName():
employee = Employee()
employee.id = reader.getAttributeValue(0)
elif event == XMLStreamConstants.CHARACTERS:
tagContent = reader.getText()
elif event == XMLStreamConstants.END_ELEMENT:
if "employee" == reader.getLocalName():
employees.append(employee)
elif "firstName" == reader.getLocalName():
employee.firstName = tagContent
elif "lastName" == reader.getLocalName():
employee.lastName = tagContent
elif "location" == reader.getLocalName():
employee.location = tagContent
for employee in employees:
print employee
You may use the sax parser of Java directly.
from javax.xml.parsers import SAXParserFactory
factory = SAXParserFactory.newInstance()
xmlReader = XMLReaderFactory.createXMLReader()
from org.xml.sax.helpers import DefaultHandler
handler = DefaultHandler() # or use your own handler
xmlReader.setContentHandler(handler)
xmlReader.parse(new InputSource(streamReader))
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With