I'm using a XML format file to import a CSV file, and the first data row is getting skipped. I can't figure out why.
Format file
<?xml version="1.0"?>
<BCPFORMAT xmlns="http://schemas.microsoft.com/sqlserver/2004/bulkload/format"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<RECORD>
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR='","' />
<FIELD ID="2" xsi:type="CharTerm" TERMINATOR='\n' />
</RECORD>
<ROW>
<COLUMN SOURCE="1" NAME="COLUMN1" xsi:type="SQLVARYCHAR" />
<COLUMN SOURCE="2" NAME="COLUMN2" xsi:type="SQLVARYCHAR" />
</ROW>
</BCPFORMAT>
CSV
COLUMN1,COLUMN2
"ABC","ABC123456"
"TNT","TNT123456"
Query
SELECT *
FROM OPENROWSET(BULK 'C:\sample.csv',
FORMATFILE='C:\sample.xml',
FIRSTROW = 2) AS a
Result
COLUMN1 COLUMN2
------- ----------
"TNT TNT123456"
(1 row(s) affected)
If FIRSTROW
is changed to 1
, the result becomes:
COLUMN1 COLUMN2
--------------------- ----------
COLUMN1,COLUMN2 "ABC ABC123456"
"TNT TNT123456"
If the header row is removed from the CSV and FIRSTROW
is changed to 1
, the result returns as expected:
COLUMN1 COLUMN2
------- ----------
"ABC ABC123456"
"TNT TNT123456"
Since this is an automated report that is delivered with headers, are there any other options to remedy this?
There are a couple of problems here:
I suspect there isn't a valid \n
on the first line. Otherwise SQL Server wouldn't munge the first two rows when you change to FIRSTROW = 1
.
Using ","
as the column delimiter works great for all of the columns except the first and the last column. This leaves a leading "
on the first column, and a trailing "
on the last column. You can deal with the latter by changing your ROWTERMINATOR
to "\n
, but that will only work if you can also add a trailing "
to the header row (during the process of ensuring that there is a valid \n
there). At that point you may as well make sure that the header row matches the data rows in all aspects, so:
"COLUMN1","COLUMN2"
-------------------^ this character has to be \n
In all honesty, I think you could spend a week fighting with all of these nitty-gritty BCP
and BULK INSERT
issues, and still not have a perfect solution that doesn't require post-op actions (such as trimming leading/trailing "
characters from certain columns). My recommendation: spend 20 minutes and write a parser in C# that will automatically correct these files - removing the header row, ensuring the right delimiters are in place, removing all the stupid "
, etc. before SQL Server ever sees the file. Cleaning the file up will be a lot less hassle than the hoops you're jumping through now. I'm sure there are solutions to this but IIRC you've been wrestling with it for quite some time...
The terminator of the first field should be ',' only, not '","'.
Replace with the following line, it's gonna work:
<FIELD ID="1" xsi:type="CharTerm" TERMINATOR=',' />
Here's what happens with your original file format...
The first columns is terminating with : ","... That means, SQL server parses the first line, then reads the seconds line and get the first field:
COLUMN1,COLUMN2 "ABC
The it continues to read and get the second field (remember, we are still on the second line of the file) :
ABC123456"
It now has the first row...
It then reads the next row:
"TNT TNT123456"
So when you skip first row, it indeeds skips the first row because your first line is not using quotes...
Hope that helps..
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With