I have 6 million rows worth of data I want to insert into my SQL Server database. I can do it the slow way with 6 million INSERT statements (by my calculation it would take 18 hours to run) or I can try BULK INSERT.
BULK INSERT has issues with not being able to escape characters, but the data in this case is very simple and so shouldn't run into this problem.
However SQL Server doesn't seem to like any form of date/time data to be inserted into a field.
Here is the table (psuedo-SQL)
CREATE TABLE Tasks (
TaskId bigint NOT NULL IDENTITY(1,1) PRIMARY KEY,
TriggerId bigint NOT NULL FOREIGN KEY,
Created datetime NOT NULL,
Modified datetime NOT NULL,
ScheduledFor datetime NULL,
LastRan datetime NULL,
-- and about 10 more fields after this
)
Here is my BULK INSERT statement:
SET DATEFORMAT dmy
BULK INSERT Tasks
FROM 'C:\TasksBulk.dat'
WITH (
-- CHECK_CONSTRAINTS is not necessary as the only constraints are always enforced regardless of this option (UNIQUE, PRIMARY KEY, and NOT NULL)
CODEPAGE = 'RAW',
DATAFILETYPE = 'native',
KEEPIDENTITY,
MAXERRORS = 1,
ORDER ( CallId ASC ),
FIELDTERMINATOR = '\t',
ROWTERMINATOR = '\0'
)
And here is the first row of data in TasksBulk.dat:
1000\t1092\t01/01/2010 04:00:17\t01/01/2010 04:00:17\t\t01/01/2010 04:00:14\0
(For readability, reformatted with tab characters replaced with 4-spaces:)
1000 1092 01/01/2010 04:00:17 01/01/2010 04:00:17 01/01/2010 04:00:14\0
However when I run the BULK INSERT statement, I get this error:
Msg 4864, Level 16, State 1, Line 2 Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 3 (Created).
I have tried using different row and field terminators and every different date/time format (including "01/01/2010", "2010-01-01", both with and without the "04:00:17" time component). I don't know what I'm doing wrong here.
It turns out that changing the DATAFILETYPE from 'native' to 'char' solved the problem. The 'native' type implies a strict data format for everything, whereas 'char' is meant for more plaintext files.
You have your CODDEPAGE
set to RAW
(presumably for speed).
The error message implies your data contains characters outside the codepage.
CODEPAGE [ = 'ACP' | 'OEM' | 'RAW' | 'code_page' ]
Specifies the code page of the data in the data file. CODEPAGE is relevant only if the data contains char, varchar, or text columns with character values greater than 127 or less than 32.
But that could be misleading. Your example data line contains a missing column. If you don't use a format file, every field in the table must be used.
So you could either create a format file or create a staging table with varchar(25) for the datetime columns, importing and then perform an update from the staging table into the destination table. That way you have more control over conversions and missing data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With