Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Sql Bulk insert XML format file with double quotes in terminator

I'm trying to insert some data into a table from a csv document which has all of the fields delimited with ""

ie.

 APPLICANTID,NAME,CONTACT,PHONENO,MOBILENO,FAXNO,EMAIL,ADDR1,ADDR2,ADDR3,STATE,POSTCODE
 "3","Snoop Dogg","Snoop Dogg","411","","","","411 High Street","USA 
 ","","USA", "1111" "4","LL Cool J","LL Cool J","","","","","5 King
 Street","","","USA","1111"

I am using an xml format file to try and overcome the "" delimiters as I believe I would have to update the data again after importing to remove the inital " if it did not.

My format file looks like the following:

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="NCharTerm" TERMINATOR='",' MAX_LENGTH="12"/>
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="5" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="6" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="7" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="8" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="9" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="10" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
  <FIELD ID="12" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="Latin1_General_CI_AS"/>
 </RECORD>
 <ROW>
  <COLUMN SOURCE="1" NAME="APPLICANTID" xsi:type="SQLINT"/>
  <COLUMN SOURCE="2" NAME="NAME" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="5" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="6" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="7" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="8" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="9" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="10" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="11" NAME="STATE" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="12" NAME="POSTCODE" xsi:type="SQLCHAR"/>
 </ROW>
</BCPFORMAT>

and I am running the import with the following:

BULK INSERT [PracticalDB].dbo.applicant 
FROM 'C:\temp.csv'
WITH (KEEPIDENTITY, FORMATFILE='C:\temp.xml', FIRSTROW = 2)

I am getting the error:

Msg 4864, Level 16, State 1, Line 1 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (APPLICANTID).

for all of the rows.

I have tried various different combinations for the terminator including using:

TERMINATOR="&quot;,"
TERMINATOR="\","
TERMINATOR='","
TERMINATOR='\","

and none of them seem to work.

Is there a correct way to escape the " so that it will be parsed correctly, assuming that that is my problem here.

like image 620
Daniel Powell Avatar asked Dec 16 '11 05:12

Daniel Powell


People also ask

What is field terminator in bulk insert?

FIRSTROW is what row in the file it should start importing from. We have a header row in our file so the data starts on line 2. FIELDTERMINATOR is what the separator is between fields, a comma in our case.

Which is the default field terminator for bulk insert in SQL Server?

Specifying Terminators for Bulk Import Specifies the field terminator to be used for character and Unicode character data files. The default is \t (tab character).

How do you specify text qualifier in bulk insert?

You need to use a 'format file' to implement a text qualifier for bulk insert. Essentially, you will need to teach the bulk insert that there's potentially different delimiters in each field. Create a text file called "level_2. fmt" and save it.

What is row Terminator 0x0a?

0x0a is simply a hex representation of the ascii newline character, commonly stylised as \n . – TZHX. Aug 2 at 11:21. I tried to query a JSON file using 0x0a and then \n as the ROWTERMINATOR.


2 Answers

Ok so I figured it out!

You can use ' instead of " when you are defining the xml attributes ie TERMINATOR='', then you can use the " within them without worrying.

Also I needed to eat the first " with a field so the other columns could be parsed correctly. This ended up with the format file

<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <RECORD>
  <FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' />
  <FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="4" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="5" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="6" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="7" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="8" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="9" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="10" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="11" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="12" xsi:type="CharTerm" TERMINATOR='","' />
  <FIELD ID="13" xsi:type="CharTerm" TERMINATOR='"\r\n' />
 </RECORD>
 <ROW>
  <COLUMN SOURCE="2" NAME="APPLICANTID" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="3" NAME="NAME" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="4" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="5" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="6" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="7" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="8" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="9" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="10" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="11" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="12" NAME="STATE" xsi:type="SQLNVARCHAR"/>
  <COLUMN SOURCE="13" NAME="POSTCODE" xsi:type="SQLCHAR"/>
 </ROW>
</BCPFORMAT>

Where the first field is just a throw away one to remove the first " and the other fields all separate on "," and the final separates on "(newline)

like image 190
Daniel Powell Avatar answered Oct 24 '22 19:10

Daniel Powell


Tip: if only some of the fields are doubleqouted, then use the openrowset version of the bulk insert, and doing so, you can manipulate the field content coming from the input file before inserting into the target table.

In the manipulation you can do anything with the field content, e.g. removing double-quotes. The effect on the performance is not mentioned here, I have no measures regarding this.

like image 42
Estevez Avatar answered Oct 24 '22 19:10

Estevez