I have encountered a problem that I need to import a huge XML (> 1Gb) into SQL Server 2008 daily. What I have now is a sample XML file and the XML schema of it. The XML schema is pretty complex which contains many custom defined simple type, and element with complex type such as:
<xs:element name="xxxx_url">
<xs:complexType>
<xs:simpleContent>
<xs:extension base="xs:anyURI">
<xs:attribute ref="target" use="optional"/>
<xs:attribute ref="abc" use="optional"/>
</xs:extension>
</xs:simpleContent>
</xs:complexType>
</xs:element>
After import, a WCF service will be implemented to retrieve the data stored in SQL Sever, something like search, retrieve etc (read-only operations).
The implementation steps I can think of are like:
The problem is that these steps involve lots of manual work. I have to research the XSD row by row, and transform it to object model and database schema mannualy.
I did some research that there're some automation tools to transform XSD into classes, and also transform XSD into database schema. But the classes transformed from XSD with the tool are pretty messed, and the transformation to schema is failed because it does not conform to MS dataset format.
I am wondering is there any good solution to this problem, to save a lot of manual work?
Any suggestion is appreciated !
When you bulk import XML data from a file that contains an encoding declaration that you want to apply, specify the SINGLE_BLOB option in the OPENROWSET(BULK...) clause. The SINGLE_BLOB option ensures that the XML parser in SQL Server imports the data according to the encoding scheme specified in the XML declaration.
SQL Server lets you retrieve data as XML by supporting the FOR XML clause, which can be included as part of your query. You can use the FOR XML clause in the main (outer) query as well as in subqueries. The clause supports numerous options that let you define the format of the XML data.
At some point you have to do the transformation. Whether you do so reading the XML into objects or into data in tables. The work should be done once and then you just have to run the resulting process. The issues I see are:
The XML is very large.
You do not yet have a mapping of the XSD to your desired schema.
The mapping is work that you are going to have to do. I would think it would perform best if you can import the XSD into table and then import from those temporary table to the schema you wish to use. Working with the XML file will give you issues due to its size.
So my suggestion is to force/fudge the import of the XML into what ever table structure will work. Then write a stored procedure to "import" the data from those tables into your "real" schema.
Pat O
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With