Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Importing XML data into MS SQL server programmatically

I have 5 large XML files which I am keen to analyse. All of them are too large to open in a text editor and so I do not know their XML schemas.

I have tried to import them into SQL server, however the process has given me an error even though I am pretty sure they are valid, as they were sourced from very reputable programmers.

I have also tried other methods but each struggles with the large file sizes (MySQL) or state that the files contain invalid XML characters (Access & Excel).

How can I read and insert the data programmatically? Can this be done via SQL query?

Thanks a lot!

like image 226
Jon Winstanley Avatar asked Jun 05 '09 21:06

Jon Winstanley


4 Answers

As of 2013...

The only time saving option in my opinion to load large/huge XML files in SQL Server is (as someone previously briefly mentioned) to use the SQLXML 4.0 library.

This is the solution I adopted to load huge XML files (7GB in size) on a daily basis. The previous process which was using C# manipulation in the Script Task took hours to complete. Using SQLXML 4.0 takes 15-20 minutes. How to install SQLXML 4.0. step by step here. For practical examples in how to do it end to end follow this MSDN link.

My XML has also nested elements, so it's quite complex, the result is 10 tables with 2.5 to 4 million rows each (the daily file sometimes is more than 7GB). My work was based purely on information I learned and applied from the two links provided above.

  • Advantages:

    • it's fast
    • it's Microsoft (http://www.microsoft.com/en-gb/download/details.aspx?id=30403)
    • SSIS package will be very much simplified
    • you don't need to spend hours and hours to change the SSIS package if your XML schema changes. SQLXML is able to create the tables in the SQL Server for you every time you run the package, based on the XSD relationships you provide.
  • Disadvantages

    • creating the XSD may take a while and requires some knowledge. When I did it I learned something new, so this was not a real a disadvantage for me.
    • when seeing how simple the SSIS package is, your manager will have the impression that you didn't do any work.

To view large files use Large Text File Viewer, nice little gem.

Note: The question is quite old, but the "issue" remains hot. I added this post for the developers who Google how to BULK LOAD XML files in SSIS and land here.

like image 99
Dan S Avatar answered Nov 20 '22 11:11

Dan S


Try the free LogParser utility from Microsoft: http://www.microsoft.com/DownLoads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en

It's designed to give you SQL-like access to large text files including XML. Something like

Select top 1000 * from myFile.xml

...should work to get you started. Also, beware that the documentation will appear in your start menu along side the executable after installation--I don't think there's a good copy on line.

like image 39
steamer25 Avatar answered Nov 20 '22 13:11

steamer25


See this blog post by unofficial StackOverflow team member Brent Ozar:
http://www.brentozar.com/archive/2009/06/how-to-import-the-stackoverflow-xml-into-sql-server/

like image 23
Joel Coehoorn Avatar answered Nov 20 '22 13:11

Joel Coehoorn


You kind of have to know the schema. Try downloading TextPad or something similar to view the files.

Once you know the schema you can do a couple of things to get them into SQL. One approach would be to use OpenXML http://msdn.microsoft.com/en-us/library/ms186918.aspx.

like image 1
brendan Avatar answered Nov 20 '22 11:11

brendan