Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

BULK INSERT with inconsistent number of columns

I am trying to load a large amount data in SQL server from a flat file using BULK INSERT. However, my file has varying number of columns, for instance the first row contains 14 and the second contains 4. That is OK, I just want to make a table with the max number of columns and load the file into it with NULLs for the missing columns. I can play with it from that point. But it seems that SQL Server, when reaching the end of the line and having more columns to fill for that same row in the destination table, just moves on to the next line and attempts to put the data on that line to the wrong column of the table.

Is there a way to get the behavior that I am looking for? Is there an option that I can use to specify this? Has anyone run into this before?

Here is the code

BULK INSERT #t
FROM '<path to file>'
WITH 
(
  DATAFILETYPE = 'char',
  KEEPNULLS,
  FIELDTERMINATOR = '#'
)
like image 600
aceinthehole Avatar asked Apr 08 '10 17:04

aceinthehole


3 Answers

BULK INSERT isn't particularly flexible. One work-around is to load each row of data into an interim table that contains a single big varchar column. Once loaded, you then parse each row using your own routines.

like image 179
Philip Kelley Avatar answered Oct 20 '22 20:10

Philip Kelley


My workaround (tested in T-SQL):

  1. Create table with colum count = minimum column count of your import file
  2. Run bulk insert (it will succeed now)

In last table column, you will find all rest items (including your item separator)

If it is necessery for you, create another full-columned table, copy all columns from first table, and do some parsing only over last column.

Example file

alpha , beta , gamma
one   , two  , three , four

will look like this in your table:

c1      | c2     | c3
"alpha" | "beta" | "gamma"
"one"   | "two"  | "three , four"
like image 30
Trivius Avatar answered Oct 20 '22 19:10

Trivius


Another workaround is to preprocess the file. It may be easier to write a small standalone program to add terminators to each line so it can be BULK loaded properly than to parse the lines using T-SQL.

Here's one example in VB6/VBA. It's certainly not as fast as the SQL Server bulk insert, but it just preprocessed 91000 rows in 10 seconds.

Sub ColumnDelimiterPad(FileName As String, OutputFileName As String, ColumnCount As Long, ColumnDelimiter As String, RowDelimiter As String)
   Dim FileNum As Long
   Dim FileData As String

   FileNum = FreeFile()
   Open FileName For Binary Access Read Shared As #FileNum
   FileData = Space$(LOF(FileNum))
   Debug.Print "Reading File " & FileName & "..."
   Get #FileNum, , FileData
   Close #FileNum

   Dim Patt As VBScript_RegExp_55.RegExp
   Dim Matches As VBScript_RegExp_55.MatchCollection

   Set Patt = New VBScript_RegExp_55.RegExp
   Patt.IgnoreCase = True
   Patt.Global = True
   Patt.MultiLine = True
   Patt.Pattern = "[^" & RowDelimiter & "]+"
   Debug.Print "Parsing..."
   Set Matches = Patt.Execute(FileData)

   Dim FileLines() As String
   Dim Pos As Long
   Dim MissingDelimiters

   ReDim FileLines(Matches.Count - 1)
   For Pos = 0 To Matches.Count - 1
      If (Pos + 1) Mod 10000 = 0 Then Debug.Print Pos + 1
      FileLines(Pos) = Matches(Pos).Value
      MissingDelimiters = ColumnCount - 1 - Len(FileLines(Pos)) + Len(Replace(FileLines(Pos), ColumnDelimiter, ""))
      If MissingDelimiters > 0 Then FileLines(Pos) = FileLines(Pos) & String(MissingDelimiters, ColumnDelimiter)
   Next
   If (Pos + 1) Mod 10000 <> 0 Then Debug.Print Pos + 1

   If Dir(OutputFileName) <> "" Then Kill OutputFileName
   Open OutputFileName For Binary Access Write Lock Read Write As #FileNum
   Debug.Print "Writing " & OutputFileName & "..."
   Put #FileNum, , Join(FileLines, RowDelimiter)
   Close #FileNum
   Debug.Print "Done."
End Sub
like image 2
ErikE Avatar answered Oct 20 '22 20:10

ErikE