I'm trying to insert some data into a table from a csv document which has all of the fields delimited with ""
ie.
APPLICANTID,NAME,CONTACT,PHONENO,MOBILENO,FAXNO,EMAIL,ADDR1,ADDR2,ADDR3,STATE,POSTCODE
"3","Snoop Dogg","Snoop Dogg","411","","","","411 High Street","USA
","","USA", "1111" "4","LL Cool J","LL Cool J","","","","","5 King
Street","","","USA","1111"
I am using an xml format file to try and overcome the "" delimiters as I believe I would have to update the data again after importing to remove the inital " if it did not.
My format file looks like the following:
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="NCharTerm" TERMINATOR='",' MAX_LENGTH="12"/>
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="10" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="11" xsi:type="CharTerm" TERMINATOR=',"' COLLATION="Latin1_General_CI_AS"/>
<FIELD ID="12" xsi:type="CharTerm" TERMINATOR="\r\n" COLLATION="Latin1_General_CI_AS"/>
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="APPLICANTID" xsi:type="SQLINT"/>
<COLUMN SOURCE="2" NAME="NAME" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="6" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="7" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="8" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="9" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="10" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="11" NAME="STATE" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="12" NAME="POSTCODE" xsi:type="SQLCHAR"/>
</ROW>
</BCPFORMAT>
and I am running the import with the following:
BULK INSERT [PracticalDB].dbo.applicant
FROM 'C:\temp.csv'
WITH (KEEPIDENTITY, FORMATFILE='C:\temp.xml', FIRSTROW = 2)
I am getting the error:
Msg 4864, Level 16, State 1, Line 1 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 2, column 1 (APPLICANTID).
for all of the rows.
I have tried various different combinations for the terminator including using:
TERMINATOR="","
TERMINATOR="\","
TERMINATOR='","
TERMINATOR='\","
and none of them seem to work.
Is there a correct way to escape the " so that it will be parsed correctly, assuming that that is my problem here.
FIRSTROW is what row in the file it should start importing from. We have a header row in our file so the data starts on line 2. FIELDTERMINATOR is what the separator is between fields, a comma in our case.
Specifying Terminators for Bulk Import Specifies the field terminator to be used for character and Unicode character data files. The default is \t (tab character).
You need to use a 'format file' to implement a text qualifier for bulk insert. Essentially, you will need to teach the bulk insert that there's potentially different delimiters in each field. Create a text file called "level_2. fmt" and save it.
0x0a is simply a hex representation of the ascii newline character, commonly stylised as \n . – TZHX. Aug 2 at 11:21. I tried to query a JSON file using 0x0a and then \n as the ROWTERMINATOR.
Ok so I figured it out!
You can use ' instead of " when you are defining the xml attributes ie TERMINATOR='', then you can use the " within them without worrying.
Also I needed to eat the first " with a field so the other columns could be parsed correctly. This ended up with the format file
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR='"' />
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="3" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="4" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="5" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="6" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="7" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="8" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="9" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="10" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="11" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="12" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="13" xsi:type="CharTerm" TERMINATOR='"\r\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="2" NAME="APPLICANTID" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="3" NAME="NAME" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="4" NAME="CONTACT" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="5" NAME="PHONENO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="6" NAME="MOBILENO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="7" NAME="FAXNO" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="8" NAME="EMAIL" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="9" NAME="ADDR1" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="10" NAME="ADDR2" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="11" NAME="ADDR3" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="12" NAME="STATE" xsi:type="SQLNVARCHAR"/>
<COLUMN SOURCE="13" NAME="POSTCODE" xsi:type="SQLCHAR"/>
</ROW>
</BCPFORMAT>
Where the first field is just a throw away one to remove the first " and the other fields all separate on "," and the final separates on "(newline)
Tip: if only some of the fields are doubleqouted, then use the openrowset
version of the bulk insert, and doing so, you can manipulate the field content coming from the input file
before inserting into the target table.
In the manipulation you can do anything with the field content, e.g. removing double-quotes. The effect on the performance is not mentioned here, I have no measures regarding this.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With