HTML inside node using ElementTree

Question

I am using ElementTree to parse a XML file. In some fields, there will be HTML data. For example, consider a declaration as follows:

<Course>
    <Description>Line 1<br />Line 2</Description>
</Course>

Now, supposing _course is an Element variable which hold this Couse element. I want to access this course's description, so I do:

desc = _course.find("Description").text;

But then desc only contains "Line 1". I read something about the .tail attribute, so I tried also:

desc = _course.find("Description").tail;

And I get the same output. What should I do to make desc be "Line 1
Line 2" (or literally anything between and )? In other words, I'm looking for something similar to the .innerText property in C# (and many other languages I guess).

Dana the Sane · Accepted Answer

Do you have any control over the creation of the xml file? The contents of xml tags which contain xml tags (or similar), or markup chars ('<', etc) should be encoded to avoid this problem. You can do this with either:

a CDATA section
Base64 or some other encoding (which doesn't include xml reserved characters)
Entity encoding ('<' == '<')

If you can't make these changes, and ElementTree can't ignore tags not included in the xml schema, then you will have to pre-process the file. Of course, you're out of luck if the schema overlaps html.

HTML inside node using ElementTree

Tags:

python

html

xml

elementtree

Rafael Almeida

1 Answers

Dana the Sane

Recent Activity

Donate For Us

HTML inside node using ElementTree

Tags:

python

html

xml

elementtree

Rafael Almeida

1 Answers

Dana the Sane

Related questions

Recent Activity

Donate For Us