Questions Linux Laravel Mysql Ubuntu Git Menu

HTML CSS JAVASCRIPT SQL PYTHON PHP BOOTSTRAP JAVA JQUERY R React Kotlin

Python XML Parsing without root

Tags:

python

parsing

xml

python-2.7

elementtree

I wanted to parse a fairly huge xml-like file which doesn't have any root element. The format of the file is:

<tag1>
<tag2>
</tag2>
</tag1>

<tag1>
<tag3/>
</tag1>

What I tried:

tried using ElementTree but it returned a "no root" error. (Is there any other python library which can be used for parsing this file?)
tried adding an extra tag to wrap the entire file and then parse it using Element-Tree. However, I would like to use some more efficient method, in which I would not need to alter the original xml file.

like image

869

asked May 27 '14 13:05

sgp

People also ask

What is XML Etree ElementTree in Python?

The xml.etree.ElementTree module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available.

2 Answers

lxml.html can parse fragments:

from lxml import html
s = """<tag1>
 <tag2>
 </tag2>
</tag1>

<tag1>
 <tag3/>
</tag1>"""
doc = html.fromstring(s)
for thing in doc:
    print thing
    for other in thing:
        print other
"""
>>> 
<Element tag1 at 0x3411a80>
<Element tag2 at 0x3428990>
<Element tag1 at 0x3428930>
<Element tag3 at 0x3411a80>
>>>
"""

Courtesy this SO answer

And if there is more than one level of nesting:

def flatten(nested):
    """recusively flatten nested elements

    yields individual elements
    """
    for thing in nested:
        yield thing
        for other in flatten(thing):
            yield other
doc = html.fromstring(s)
for thing in flatten(doc):
    print thing

Similarly, lxml.etree.HTML will parse this. It adds html and body tags:

d = etree.HTML(s)
for thing in d.iter():
    print thing

""" 
<Element html at 0x3233198>
<Element body at 0x322fcb0>
<Element tag1 at 0x3233260>
<Element tag2 at 0x32332b0>
<Element tag1 at 0x322fcb0>
<Element tag3 at 0x3233148>
"""

like image

76

answered Sep 29 '22 19:09

wwii

How about instead of editing the file do something like this

import xml.etree.ElementTree as ET

with file("xml-file.xml") as f:
    xml_object = ET.fromstringlist(["<root>", f.read(), "</root>"])

like image

20

answered Sep 29 '22 19:09

nettux

Sign in to Comment

Related questions
                            
                                Plot image color histogram using matplotlib
                            
                                REST post using Python-Request
                            
                                How do I output a list of dictionaries to an Excel sheet?
                            
                                Python isnumeric function works only on unicode
                            
                                What could be the reason for a socket error "[Errno 9] Bad file descriptor"
                            
                                "UnboundLocalError: local variable referenced before assignment" when incrementing variable in function [duplicate]
                            
                                How to check if two keys in dictionary hold the same value
                            
                                Django: DateField "This field cannot be blank."
                            
                                Make two directories static in django
                            
                                Why is numpy.random.choice so slow?
                            
                                CORS - Using AJAX to post on a Python (webapp2) web service
                            
                                How to install latest version of Django 1.5 using pip?
                            
                                Floating Point Numbers [duplicate]
                            
                                How can Tornado serve a single static file at an arbitrary location?
                            
                                ReferenceError: "something" is not defined in QML
                            
                                EVE - define custom flask controllers [closed]
                            
                                How To Delete S3 Files Starting With
                            
                                Does there exist empty class in python?
                            
                                Custom sorting with Pandas
                            
                                Newey-West standard errors for OLS in Python?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With