I am using ElementTree to parse a XML file. In some fields, there will be HTML data. For example, consider a declaration as follows:
<Course>
<Description>Line 1<br />Line 2</Description>
</Course>
Now, supposing _course is an Element variable which hold this Couse element. I want to access this course's description, so I do:
desc = _course.find("Description").text;
But then desc only contains "Line 1". I read something about the .tail attribute, so I tried also:
desc = _course.find("Description").tail;
And I get the same output. What should I do to make desc be "Line 1
Line 2" (or literally anything between and )? In other words, I'm looking for something similar to the .innerText property in C# (and many other languages I guess).
Do you have any control over the creation of the xml file? The contents of xml tags which contain xml tags (or similar), or markup chars ('<
', etc) should be encoded to avoid this problem. You can do this with either:
<
' ==
'<
')If you can't make these changes, and ElementTree can't ignore tags not included in the xml schema, then you will have to pre-process the file. Of course, you're out of luck if the schema overlaps html.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With