I'm trying to figure out the best way to accomplish the following:
For the first part, is this something that would need to be done manually, or could it be accomplished with a cron?
Most of the questions and answers related to XML and relational databases refer to Python or PHP. Could this be done with javascript/nodejs as well?
If this question is better suited for a different StackExchange forum, please let me know and I will move it there instead.
Below is a sample of the xml code:
<case-file>
<serial-number>123456789</serial-number>
<transaction-date>20150101</transaction-date>
<case-file-header>
<filing-date>20140101</filing-date>
</case-file-header>
<case-file-statements>
<case-file-statement>
<code>AQ123</code>
<text>Case file statement text</text>
</case-file-statement>
<case-file-statement>
<code>BC345</code>
<text>Case file statement text</text>
</case-file-statement>
</case-file-statements>
<classifications>
<classification>
<international-code-total-no>1</international-code-total-no>
<primary-code>025</primary-code>
</classification>
</classifications>
</case-file>
Here's some more information about how these files will be used:
All XML files will be in the same format. There are probably a few dozen elements within each record. The files are updated by a third party on a daily basis (and are available as zipped files on the third-party website). Each day's file represents new case files as well as updated case files.
The goal is to allow a user to search for information and organize those search results on the page (or in a generated pdf/excel file). For example, a user might want to see all case files that include a particular word within the <text>
element. Or a user might want to see all case files that include primary code 025 (<primary-code>
element) and that were filed after a particular date (<filing-date>
element).
The only data entered into the database will be from the XML files--users won't be adding any of their own information to the database.
You can create a join view based on the XML file to converts the hierarchical XML data to a relational structure. You can then use the join view as a table in a single-table constraint or table pair. To create the join view, you create a self join based on the XML file.
Often times, the large size of XML structures is due to the fact that they are an XML representation of a database dump. There might be redundant or even useless information that you could discard with an XSLT transformation.
"You can store XML in a database designed specifically for XML, in a modified object database, or in a relational database."
All steps could certainly be accomplished using node.js
. There are modules available that will help you with each of these tasks:
xml-stream: allows you to stream a file and register events that fire when the parser encounters certain XML elements. I have successfully used this module to parse KML files (granted they were significantly smaller than your files).
node-postgres: node client for PostgreSQL (I am sure there are clients for many other common RDBMS, PG is the only one I have used so far).
Most of these modules have pretty great examples that will get you started. Here's how you would probably set up the XML streaming part:
var XmlStream = require('xml-stream');
var xml = fs.createReadStream('path/to/file/on/disk'); // or stream directly from your online source
var xmlStream = new XmlStream(xml);
xmlStream.on('endElement case-file', function(element) {
// create and execute SQL query/queries here for this element
});
xmlStream.on('end', function() {
// done reading elements
// do further processing / query database, etc.
});
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With