I have this example xml file
<page>
<title>Chapter 1</title>
<content>Welcome to Chapter 1</content>
</page>
<page>
<title>Chapter 2</title>
<content>Welcome to Chapter 2</content>
</page>
I like to extract the contents of title tags and content tags.
Which method is good to extract the data, using pattern matching or using xml module. Or is there any better way to extract the data.
How to extract text and metadata from XML files. Click inside the file drop area to upload a XML file or drag & drop a XML file. Click Get Text and Metadata button to extract text and metadata from your XML document. Once your XML is processed click on Download Now button.
There are two ways to parse the file using 'ElementTree' module. The first is by using the parse() function and the second is fromstring() function. The parse () function parses XML document which is supplied as a file whereas, fromstring parses XML when supplied as a string i.e within triple quotes.
There is already a built-in XML library, notably ElementTree
. For example:
>>> from xml.etree import cElementTree as ET
>>> xmlstr = """
... <root>
... <page>
... <title>Chapter 1</title>
... <content>Welcome to Chapter 1</content>
... </page>
... <page>
... <title>Chapter 2</title>
... <content>Welcome to Chapter 2</content>
... </page>
... </root>
... """
>>> root = ET.fromstring(xmlstr)
>>> for page in list(root):
... title = page.find('title').text
... content = page.find('content').text
... print('title: %s; content: %s' % (title, content))
...
title: Chapter 1; content: Welcome to Chapter 1
title: Chapter 2; content: Welcome to Chapter 2
Code :
from xml.etree import cElementTree as ET
tree = ET.parse("test.xml")
root = tree.getroot()
for page in root.findall('page'):
print("Title: ", page.find('title').text)
print("Content: ", page.find('content').text)
Output:
Title: Chapter 1
Content: Welcome to Chapter 1
Title: Chapter 2
Content: Welcome to Chapter 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With