I am trying to load a large amount data in SQL server from a flat file using BULK INSERT. However, my file has varying number of columns, for instance the first row contains 14 and the second contains 4. That is OK, I just want to make a table with the max number of columns and load the file into it with NULLs for the missing columns. I can play with it from that point. But it seems that SQL Server, when reaching the end of the line and having more columns to fill for that same row in the destination table, just moves on to the next line and attempts to put the data on that line to the wrong column of the table.
Is there a way to get the behavior that I am looking for? Is there an option that I can use to specify this? Has anyone run into this before?
Here is the code
BULK INSERT #t
FROM '<path to file>'
WITH
(
DATAFILETYPE = 'char',
KEEPNULLS,
FIELDTERMINATOR = '#'
)
BULK INSERT isn't particularly flexible. One work-around is to load each row of data into an interim table that contains a single big varchar column. Once loaded, you then parse each row using your own routines.
My workaround (tested in T-SQL):
In last table column, you will find all rest items (including your item separator)
If it is necessery for you, create another full-columned table, copy all columns from first table, and do some parsing only over last column.
Example file
alpha , beta , gamma
one , two , three , four
will look like this in your table:
c1 | c2 | c3
"alpha" | "beta" | "gamma"
"one" | "two" | "three , four"
Another workaround is to preprocess the file. It may be easier to write a small standalone program to add terminators to each line so it can be BULK loaded properly than to parse the lines using T-SQL.
Here's one example in VB6/VBA. It's certainly not as fast as the SQL Server bulk insert, but it just preprocessed 91000 rows in 10 seconds.
Sub ColumnDelimiterPad(FileName As String, OutputFileName As String, ColumnCount As Long, ColumnDelimiter As String, RowDelimiter As String)
Dim FileNum As Long
Dim FileData As String
FileNum = FreeFile()
Open FileName For Binary Access Read Shared As #FileNum
FileData = Space$(LOF(FileNum))
Debug.Print "Reading File " & FileName & "..."
Get #FileNum, , FileData
Close #FileNum
Dim Patt As VBScript_RegExp_55.RegExp
Dim Matches As VBScript_RegExp_55.MatchCollection
Set Patt = New VBScript_RegExp_55.RegExp
Patt.IgnoreCase = True
Patt.Global = True
Patt.MultiLine = True
Patt.Pattern = "[^" & RowDelimiter & "]+"
Debug.Print "Parsing..."
Set Matches = Patt.Execute(FileData)
Dim FileLines() As String
Dim Pos As Long
Dim MissingDelimiters
ReDim FileLines(Matches.Count - 1)
For Pos = 0 To Matches.Count - 1
If (Pos + 1) Mod 10000 = 0 Then Debug.Print Pos + 1
FileLines(Pos) = Matches(Pos).Value
MissingDelimiters = ColumnCount - 1 - Len(FileLines(Pos)) + Len(Replace(FileLines(Pos), ColumnDelimiter, ""))
If MissingDelimiters > 0 Then FileLines(Pos) = FileLines(Pos) & String(MissingDelimiters, ColumnDelimiter)
Next
If (Pos + 1) Mod 10000 <> 0 Then Debug.Print Pos + 1
If Dir(OutputFileName) <> "" Then Kill OutputFileName
Open OutputFileName For Binary Access Write Lock Read Write As #FileNum
Debug.Print "Writing " & OutputFileName & "..."
Put #FileNum, , Join(FileLines, RowDelimiter)
Close #FileNum
Debug.Print "Done."
End Sub
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With