Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Bulk Insert - Row Terminator for UNIX file + "\l" row terminator

so I have been wrestling a perplexing issue with BULK INSERT for some time. The files come from a Linux box and when I look at them in hex edit mode/notepad ++ they appear to have just a linefeed (0A) as a row terminator. I store bulk insert statements in a table which later a job selects from and executes the statement in the table to load data into a staging table.

The particular case that is perplexing to me is a table that has 7 columns. The data file only has the first 4 columns, the rest should be left NULL.

Typically they look like this:

BULK INSERT STAGING_TABLE  FROM 'FILE_LOCATION'  
WITH     (   
DATAFILETYPE = 'widechar'
,   FIELDTERMINATOR = ','
,   ROWTERMINATOR = 'something_here'   
);

The row terminator has been the biggest source of my issues.

When I try to use "\n" the bulk insert fails on an truncation error-- it seems to treat the file as one long string and only delimits the columns correctly until it runs out of columns (hence truncation error).

When I use "0x0a" the bulk insert fails on "unexpected end of file" error. There was a blank line at the end of the file but even when I removed that it still threw the same error so I'm not sure what is wrong there.

The ONLY one so far that has worked for getting data actually into the table was "\l". Does anyone know what that means? I have searched far and wide but there doesn't seem to be documentation on it. That or I have been looking in the wrong place completely.

The weird thing with \l as the rowterminator is that even though it load successfully it still doesn't respect the rowterminator... The rows just get loaded into all 7 columns and split on seemingly random intervals.

Anyone have any idea? Should I clarify some more?

like image 458
Razzle Dazzle Avatar asked Dec 05 '14 20:12

Razzle Dazzle


1 Answers

The issue you are having is actually not due to the Row Terminator. I suspect, along with the End of File error, you also saw something similar to the following:

Msg 4864, Level 16, State 1, Line 1
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 4 ({column_name}).

While what I said below the line is still valid regarding the ROWTERMINATOR, the real issue is indicated by your statement of:

[the] table that has 7 columns. The data file only has the first 4 columns, the rest should be left NULL.

This is the issue. When using BULK INSERT, the data file has to have the same number of fields as the table being inserted into. If that is not the case, then you have to use the FORMATFILE ='format_file_path' option in which case you need to create a Format File and specify the location.

I thought you could get away with the easier OPENROWSET(BULK...) so that you can do the following:

INSERT INTO STAGING_TABLE
   SELECT *
   FROM   OPENROWSET(BULK 'FILE_LOCATION' ...);

But that doesn't allow you to specify a ROWTERMINATOR without using a Format File. Hence you need the Format File in either case.

OR, you could just import into a different staging table that only has 4 columns, and then either:

  • dump that into your current STAGING_TABLE, or

  • do an ALTER TABLE to add the 3 missing columns (it is more efficient to just add 3 NULLable fields than to transfer the data from one table to another :-).

OR, as mentioned by @PhilipKelley in a comment on this answer, you could create a View with just those four fields and have that be the destination/target. And if you were doing the appropriate steps to enable the operation to be minimally logged, the MSDN page for Prerequisites for Minimal Logging in Bulk Import does not say one way or the other what the effect will be if you use a View.


Most likely the \l was just interpreted as those two literal characters, hence it not respecting the rowterminator when you tried it.

The 0x0A will work as I have tested it and it behaves as expected. Your statement should look like the following:

BULK INSERT STAGING_TABLE
FROM 'FILE_LOCATION'  
WITH (   
       DATAFILETYPE = 'widechar',
       FIELDTERMINATOR = ',',
       ROWTERMINATOR = '0x0A'
);

I tried both with and without a 0x0A character at the end of the final line and both worked just the same.

I then removed one of the commas from one of the lines, leaving it with less than the full set of fields, and that is when I got the following error:

Msg 4832, Level 16, State 1, Line 2
   Bulk load: An unexpected end of file was encountered in the data file.
Msg 7399, Level 16, State 1, Line 2
   The OLE DB provider "BULK" for linked server "(null)" reported an error. The 
                 provider did not give any information about the error.
Msg 7330, Level 16, State 2, Line 2
   Cannot fetch a row from OLE DB provider "BULK" for linked server "(null)".

Make sure that all of the rows in the data file have the required number of field separators (, in this case). You mentioned having 4 columns in the file so that should be 3 commas per row.

like image 147
Solomon Rutzky Avatar answered Sep 28 '22 03:09

Solomon Rutzky