Following is the XML file : book.xml
<?xml version="1.0" ?>
<!--Sample XML Document-->
<bookstore>
<book _id="E7854">
<title>
Sample XML Book
</title>
<author>
<name _id="AU363">
<first>
Benjamin
</first>
<last>
Smith
</last>
</name>
<affiliation>
A
</affiliation>
</author>
<chapter number="1">
<title>
First Chapter
</title>
<para>
B
<count>
783
</count>
.
</para>
</chapter>
<chapter number="3">
<title>
Third Chapter
</title>
<para>
B
<count>
59
</count>
.
</para>
</chapter>
</book>
<book _id="C843">
<title>
XML Master
</title>
<author>
<name _id="AU245">
<first>
John
</first>
<last>
Doe
</last>
</name>
<affiliation>
C
</affiliation>
</author>
<chapter number="2">
<title>
Second Chapter
</title>
<para>
K
<count>
54
</count>
.
</para>
</chapter>
<chapter number="3">
<title>
Third Chapter
</title>
<para>
K
<count>
328
</count>
.
</para>
</chapter>
<chapter number="7">
<title>
Seventh Chapter
</title>
<para>
K
<count>
265
</count>
.
</para>
</chapter>
<chapter number="9">
<title>
Ninth Chapter
</title>
<para>
K
<count>
356
</count>
.
</para>
</chapter>
</book>
</bookstore>
Following is the Python code : book_dom.py
from xml.dom import minidom, Node
import re, textwrap
class SampleScanner:
def __init__(self, doc):
for child in doc.childNodes:
if child.nodeType == Node.ELEMENT_NODE and child.tagName == 'bookstore':
self.handleBookStore(child)
def gettext(self, nodelist):
retlist = []
for node in nodelist:
if node.nodeType == Node.TEXT_NODE:
retlist.append(node.wholeText)
elif node.hasChildNodes:
retlist.append(self.gettext(node.childNodes))
return re.sub('\s+', ' ', ''.join(retlist))
def handleBookStore(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'book':
self.handleBook(child)
def handleBook(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'title':
print "Book title is:", self.gettext(child.childNodes)
if child.tagName == 'author':
self.handleAuthor(child)
if child.tagName == 'chapter':
self.handleChapter(child)
def handleAuthor(self, node):
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'name':
self.handleAuthorName(child)
elif child.tagName == 'affiliation':
print "Author affiliation:", self.gettext([child])
def handleAuthorName(self, node):
surname = self.gettext(node.getElementsByTagName("last"))
givenname = self.gettext(node.getElementsByTagName("first"))
print "Author Name: %s, %s" % (surname, givenname)
def handleChapter(self, node):
print " *** Start of Chapter %s: %s" % (node.getAttribute('number'),
self.gettext(node.getElementsByTagName('title')))
for child in node.childNodes:
if child.nodeType != Node.ELEMENT_NODE:
continue
if child.tagName == 'para':
self.handlePara(child)
def handlePara(self, node):
partext = self.gettext([node])
partext = textwrap.fill(partext)
print partext
print
doc = minidom.parse('book.xml')
SampleScanner(doc)
Output :~/$ python book_dom.py
Book ID : E7854
Book title is: Sample XML Book
Name ID : AU363
Author Name: Smith , Benjamin
Author affiliation: A
*** Start of Chapter 1: First Chapter
B 783 .
*** Start of Chapter 3: Third Chapter
B 59 .
Book ID : C843
Book title is: XML Master
Name ID : AU245
Author Name: Doe , John
Author affiliation: C
*** Start of Chapter 2: Second Chapter
K 54 .
*** Start of Chapter 3: Third Chapter
K 328 .
*** Start of Chapter 7: Seventh Chapter
K 265 .
*** Start of Chapter 9: Ninth Chapter
K 356 .
My aim is to store the Books in "Books" table and Author info in "Authors" table (preserving the book -> author relationship) [MySQL DB].
**Book table :**
id |title
E7854 Sample XML Book
....
**Chapter table :**
book_id|chapter_number|title |para
E7854 1 First Chapter B 783 .
E7854 3 Third Chapter B 59 .
....
**Author table :**
id |book_id |name |Affiliation
AU363 E7854 Smith Benjamin A
....
How do I go about storing the data in the database if I have few thousand books and authors (and chapters)? I am having trouble with uniquely identifying the dataset for each book/author. I can use the IDs and pass them to the functions to preserve the relation but I am not sure if that is the best way to do it. Any pointers are highly appreciated.
p.s : I am working on the SQL part of the script and will update once I test it. Feel free to post your thoughts, code samples. Thanks!
Write XML to the writer object. The writer should have a write() method which matches that of the file object interface. The indent parameter is the indentation of the current node. The addindent parameter is the incremental indentation to use for subnodes of the current one.
The most common way to store XML in MySQL is to use the LOAD_FILE() function to open an entire XML document, store it in a variable, and insert the variable into a table column. Here is the client_citizenship table again, with a simplified structure.
Python XML Parsing Modules Python allows parsing these XML documents using two modules namely, the xml. etree. ElementTree module and Minidom (Minimal DOM Implementation).
Based on your comment above, I would simply create a book class, an author class, an author list, and a chapter class. Assign the chapters of the book to a list of Chapter objects on the Book itself. Maintain the AuthorList as a dict of their IDs, pointing to the actual Author objects. Use a data member of the Book object to contain the ID; you can provide a method to pull the author out of the AuthorList dict for convenience.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With