In all the examples and tutorials I have seen of BeautifulSoup, an HTML/XML document is passed and a soup object is returned which can then be used to modify the document. However, how can I use BeautifulSoup to create a HTML/XML document from scratch? In other words, I have data that I would like to put in an XML file, but the XML file does not exist yet and I would like to build it from scratch. How can I go about it?
bs4: Beautiful Soup is a Python library for pulling data out of HTML and XML files.
To read an XML file using ElementTree, firstly, we import the ElementTree class found inside xml library, under the name ET (common convension). Then passed the filename of the xml file to the ElementTree. parse() method, to enable parsing of our xml file. Then got the root (parent tag) of our xml file using getroot().
Just create an empty BeautifulSoup()
object:
soup = BeautifulSoup()
and start adding elements:
soup.append(soup.new_tag("a", href="http://www.example.com"))
For XML you could start out with a XML header by using the xml
tree builder:
soup = BeautifulSoup(features='xml')
This requires lxml to be installed first. This sets the .is_xml
flag on the BeautifulSoup
object (which can also be set manually).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With