Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Best way to bulk insert XML data into SQL Server 2005 database

I have to shred around 25 - 30 XMLs into my SQL Server 2005 database (the total size would be around 10 MB). And I need this logic to run automatically as soon as new xml files are copied to the server.

Ive read many posts on this site and also other sites, but can't still conclude on what must I use to shred data.

Pls let me know which option must I should go with

  1. SqlBulk Copy
  2. C# deserialization
  3. SSIS

I have to create C# classes for my data models. So C# deserialization was my first choice. But pls let me know which option will be right from a performance perspective.

Another thing I forgot to mention was the structure of the XML files will vary. It wouldnt be the same. I will have tables that will have all the columns that could possibly be populated. But the xmls will not have all the data at all times.

Sample of the xml

<?xml version="1.0" encoding="utf-8"?>
<estateList date="2012-08-06T12:17:05">
  <uniqueID>22XXln</uniqueID>
  <category name="Apartment" /> 
  <listingAgent>
     <name>DIW Office</name>  
     <telephone type="BH">96232 2345</telephone> 
     <telephone type="BH">9234 2399</telephone>
     <email>[email protected]</email>    
  </listingAgent>
  <inspectionTimes /> 
  <description>AVAILABLE NOW. </description> 
  <price>0</price>  
  <address display="yes">      
    <street>Lachlsan Street</street>        
    <ImagesContainer>        
       <img id="m" modTime="2012-08-06-12:17:05" url="http://images/2409802.jpg" format="jpg" /> 
       <img id="a" modTime="2012-08-06-12:17:05" /> 
    </ImagesContainer>     
  </address>
</estateList>

Thanks.

like image 498
Deefa Avatar asked Oct 07 '22 23:10

Deefa


1 Answers

Given you have your XML in a SQL variabe, you can pretty easily parse out most of the info using straight T-SQL with the XQuery support added in SQL Server 2005.

Try something like:

DECLARE @Input XML = '<estateList date="2012-08-06T12:17:05">
  <uniqueID>22XXln</uniqueID>
  <category name="Apartment" /> 
  <listingAgent>
     <name>DIW Office</name>  
     <telephone type="BH">96232 2345</telephone> 
     <telephone type="BH">9234 2399</telephone>
     <email>[email protected]</email>    
  </listingAgent>
  <inspectionTimes /> 
  <description>AVAILABLE NOW. </description> 
  <price>0</price>  
  <address display="yes">      
    <street>Lachlsan Street</street>        
    <ImagesContainer>        
       <img id="m" modTime="2012-08-06-12:17:05" url="http://images/2409802.jpg" format="jpg" /> 
       <img id="a" modTime="2012-08-06-12:17:05" /> 
    </ImagesContainer>     
  </address>
</estateList>'

SELECT
    EstateListDate = EstL.value('@date', 'datetime'),
    UniqueID = EstL.value('(uniqueID)[1]', 'varchar(20)'),
    Category = EstL.value('(category/@name)[1]', 'varchar(20)'),
    ListingAgentName = EstL.value('(listingAgent/name)[1]', 'varchar(50)'),
    ListingAgentTel = EstL.value('(listingAgent/telephone)[1]', 'varchar(50)'),
    ListingAgentEMail = EstL.value('(listingAgent/email)[1]', 'varchar(250)'),
    [Description] = EstL.value('(description)[1]', 'varchar(250)'),
    Price = EstL.value('(price)[1]', 'decimal(14,2)'),
    DisplayAddress = EstL.value('(address/@display)[1]', 'varchar(10)'),
    AddressStreet = EstL.value('(address/street)[1]', 'varchar(100)')
FROM @input.nodes('/estateList') AS Tbl(EstL)

and you should get:

enter image description here

This data could be easily inserted into a table. And this query could be run against any number of XML files on disk, using a fairly easy SSIS package (enumerate the XML, load each into a SQL variable, parse it, insert data into tables etc.)

BUT: the challenging part is going to be questions like:

  • can there be more than one listing agent? And if yes : how to handle that?
  • can there be more than one phone number and how to deal with that?
  • what do to with the multiple images per address

and so forth ....

Update: this query here would extract the UniqueID and each complete <img> tag's information from that XML input and display it (or insert it into another table):

SELECT
    UniqueID = @input.value('(/estateList/uniqueID)[1]', 'varchar(20)'),
    ImageID = Images.value('(img/@id)[1]', 'varchar(20)'),
    ImageModTime = Images.value('(img/@modTime)[1]', 'varchar(50)'),
    ImageFormat = Images.value('(img/@format)[1]', 'varchar(20)'),
    ImageURL = Images.value('(img/@url)[1]', 'varchar(250)')
FROM 
    @input.nodes('/estateList/address/ImagesContainer') AS Tbl(Images)
like image 101
marc_s Avatar answered Oct 13 '22 10:10

marc_s