Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Load csv file into a VBA array rather than Excel Sheet

Tags:

excel

vba

I am currently able to enter csv file data into Excel VBA by uploading the data via the code below then handling the table, surely not the best way as I am only interested in some of the data and delete the sheet after using the data:

Sub CSV_Import() 
Dim ws As Worksheet, strFile As String 

Set ws = ActiveSheet 'set to current worksheet name 

strFile = Application.GetOpenFilename("Text Files (*.csv),*.csv", ,"Please select text file...") 

With ws.QueryTables.Add(Connection:="TEXT;" & strFile, Destination:=ws.Range("A1")) 
     .TextFileParseType = xlDelimited 
     .TextFileCommaDelimiter = True 
     .Refresh 
End With 
End Sub 

Is it possible to simply load the csv into a two dimensional variant array in VBA rather than going through the use of an excel worksheet?

like image 476
Steve Avatar asked Sep 04 '12 08:09

Steve


People also ask

How do I import a CSV file into an array?

Use numpy. loadtxt() to Read a CSV File Into an Array in Python. As the name suggests, the open() function is used to open the CSV file. NumPy's loadtxt() function helps in loading the data from a text file.

How do I use an array as a range in excel VBA?

Steps to Add a Range into an Array in VBA First, you need to declare a dynamic array using the variant data type. Next, you need to declare one more variable to store the count of the cells from the range and use that counter for the loop as well. After that, assign the range where you have value to the array.


4 Answers

Okay, looks like you need two things: stream the data from the file, and populate a 2-D array.

I have a 'Join2d' and a 'Split2d' function lying around (I recall posting them in another reply on StackOverflow a while ago). Do look at the comments in the code, there are things you might need to know about efficient string-handling if you're handling large files.

However, it's not a complicated function to use: just paste the code if you're in a hurry.

Streaming the file is simple BUT we're making assumptions about the file format: are the lines in the file delimited by Carriage-Return characters or the Carriage-Return-and-Linefeed character pair? I'm assuming 'CR' rather than CRLF, but you need to check that.

Another assumption about the format is that numeric data will appear as-is, and string or character data will be encapsulated in quote marks. This should be true, but often isn't... And stripping out the quote marks adds a lot of processing - lots of allocating and deallocating strings - which you really don't want to be doing in a large array. I've short-cut the obvious cell-by-cell find-and-replace, but it's still an issue on large files.

If your file has commas embedded in the string values, this code won't work: and don't try to code up a parser that picks out the encapsulated text and skips these embedded commas when splitting-up the rows of data into individual fields, because this intensive string-handling can't be optimised into a fast and reliable csv reader by VBA.

Anyway: here's the source code: watch out for line-breaks inserted by StackOverflow's textbox control:

Running the code:

Note that you'll need a reference to the Microsoft Scripting Runtime (system32\scrrun32.dll)

Private Sub test()     Dim arrX As Variant     arrX = ArrayFromCSVfile("MyFile.csv") End Sub 

Streaming a csv file.

Note that I'm assuming your file is in the temp folder: C:\Documents and Settings[$USERNAME]\Local Settings\Temp You'll need to use filesystem commands to copy the file into a local folder: it's always quicker than working across the network.

Public Function ArrayFromCSVfile( _     strName As String, _     Optional RowDelimiter As String = vbCr, _     Optional FieldDelimiter = ",", _     Optional RemoveQuotes As Boolean = True _ ) As Variant      ' Load a file created by FileToArray into a 2-dimensional array     ' The file name is specified by strName, and it is exected to exist     ' in the user's temporary folder. This is a deliberate restriction:     ' it's always faster to copy remote files to a local drive than to     ' edit them across the network      ' RemoveQuotes=TRUE strips out the double-quote marks (Char 34) that     ' encapsulate strings in most csv files.      On Error Resume Next      Dim objFSO As Scripting.FileSystemObject     Dim arrData As Variant     Dim strFile As String     Dim strTemp As String      Set objFSO = New Scripting.FileSystemObject     strTemp = objFSO.GetSpecialFolder(Scripting.TemporaryFolder).ShortPath     strFile = objFSO.BuildPath(strTemp, strName)     If Not objFSO.FileExists(strFile) Then  ' raise an error?         Exit Function     End If      Application.StatusBar = "Reading the file... (" & strName & ")"      If Not RemoveQuotes Then         arrData = Join2d(objFSO.OpenTextFile(strFile, ForReading).ReadAll, RowDelimiter, FieldDelimiter)         Application.StatusBar = "Reading the file... Done"     Else         ' we have to do some allocation here...          strTemp = objFSO.OpenTextFile(strFile, ForReading).ReadAll         Application.StatusBar = "Reading the file... Done"          Application.StatusBar = "Parsing the file..."          strTemp = Replace$(strTemp, Chr(34) & RowDelimiter, RowDelimiter)         strTemp = Replace$(strTemp, RowDelimiter & Chr(34), RowDelimiter)         strTemp = Replace$(strTemp, Chr(34) & FieldDelimiter, FieldDelimiter)         strTemp = Replace$(strTemp, FieldDelimiter & Chr(34), FieldDelimiter)          If Right$(strTemp, Len(strTemp)) = Chr(34) Then             strTemp = Left$(strTemp, Len(strTemp) - 1)         End If          If Left$(strTemp, 1) = Chr(34) Then             strTemp = Right$(strTemp, Len(strTemp) - 1)         End If          Application.StatusBar = "Parsing the file... Done"         arrData = Split2d(strTemp, RowDelimiter, FieldDelimiter)         strTemp = ""     End If      Application.StatusBar = False      Set objFSO = Nothing     ArrayFromCSVfile = arrData     Erase arrData End Function 

Split2d Creates a 2-dimensional VBA array from a string:

Public Function Split2d(ByRef strInput As String, _     Optional RowDelimiter As String = vbCr, _     Optional FieldDelimiter = vbTab, _     Optional CoerceLowerBound As Long = 0 _     ) As Variant      ' Split up a string into a 2-dimensional array.      ' Works like VBA.Strings.Split, for a 2-dimensional array.     ' Check your lower bounds on return: never assume that any array in     ' VBA is zero-based, even if you've set Option Base 0     ' If in doubt, coerce the lower bounds to 0 or 1 by setting     ' CoerceLowerBound     ' Note that the default delimiters are those inserted into the     '  string returned by ADODB.Recordset.GetString      On Error Resume Next      ' Coding note: we're not doing any string-handling in VBA.Strings -     ' allocating, deallocating and (especially!) concatenating are SLOW.     ' We're using the VBA Join & Split functions ONLY. The VBA Join,     ' Split, & Replace functions are linked directly to fast (by VBA     ' standards) functions in the native Windows code. Feel free to     ' optimise further by declaring and using the Kernel string functions     ' if you want to.      ' ** THIS CODE IS IN THE PUBLIC DOMAIN **     '    Nigel Heffernan   Excellerando.Blogspot.com      Dim i   As Long     Dim j   As Long      Dim i_n As Long     Dim j_n As Long      Dim i_lBound As Long     Dim i_uBound As Long     Dim j_lBound As Long     Dim j_uBound As Long      Dim arrTemp1 As Variant     Dim arrTemp2 As Variant      arrTemp1 = Split(strInput, RowDelimiter)      i_lBound = LBound(arrTemp1)     i_uBound = UBound(arrTemp1)      If VBA.LenB(arrTemp1(i_uBound)) <= 0 Then         ' clip out empty last row: a common artifact in data          'loaded from files with a terminating row delimiter         i_uBound = i_uBound - 1     End If      i = i_lBound     arrTemp2 = Split(arrTemp1(i), FieldDelimiter)      j_lBound = LBound(arrTemp2)     j_uBound = UBound(arrTemp2)      If VBA.LenB(arrTemp2(j_uBound)) <= 0 Then      ' ! potential error: first row with an empty last field...         j_uBound = j_uBound - 1     End If      i_n = CoerceLowerBound - i_lBound     j_n = CoerceLowerBound - j_lBound      ReDim arrData(i_lBound + i_n To i_uBound + i_n, j_lBound + j_n To j_uBound + j_n)      ' As we've got the first row already... populate it     ' here, and start the main loop from lbound+1      For j = j_lBound To j_uBound         arrData(i_lBound + i_n, j + j_n) = arrTemp2(j)     Next j      For i = i_lBound + 1 To i_uBound Step 1          arrTemp2 = Split(arrTemp1(i), FieldDelimiter)          For j = j_lBound To j_uBound Step 1             arrData(i + i_n, j + j_n) = arrTemp2(j)         Next j          Erase arrTemp2      Next i      Erase arrTemp1      Application.StatusBar = False      Split2d = arrData  End Function 

Join2D Turns a 2-dimensional VBA array to a string:

Public Function Join2d(ByRef InputArray As Variant, _     Optional RowDelimiter As String = vbCr, _     Optional FieldDelimiter = vbTab, _     Optional SkipBlankRows As Boolean = False _     ) As String      ' Join up a 2-dimensional array into a string. Works like the standard     '  VBA.Strings.Join, for a 2-dimensional array.     ' Note that the default delimiters are those inserted into the string     '  returned by ADODB.Recordset.GetString      On Error Resume Next      ' Coding note: we're not doing any string-handling in VBA.Strings -     ' allocating, deallocating and (especially!) concatenating are SLOW.     ' We're using the VBA Join & Split functions ONLY. The VBA Join,     ' Split, & Replace functions are linked directly to fast (by VBA     ' standards) functions in the native Windows code. Feel free to     ' optimise further by declaring and using the Kernel string functions     ' if you want to.      ' ** THIS CODE IS IN THE PUBLIC DOMAIN **     '   Nigel Heffernan   Excellerando.Blogspot.com      Dim i As Long     Dim j As Long      Dim i_lBound As Long     Dim i_uBound As Long     Dim j_lBound As Long     Dim j_uBound As Long      Dim arrTemp1() As String     Dim arrTemp2() As String      Dim strBlankRow As String      i_lBound = LBound(InputArray, 1)     i_uBound = UBound(InputArray, 1)      j_lBound = LBound(InputArray, 2)     j_uBound = UBound(InputArray, 2)      ReDim arrTemp1(i_lBound To i_uBound)     ReDim arrTemp2(j_lBound To j_uBound)      For i = i_lBound To i_uBound          For j = j_lBound To j_uBound             arrTemp2(j) = InputArray(i, j)         Next j          arrTemp1(i) = Join(arrTemp2, FieldDelimiter)      Next i      If SkipBlankRows Then          If Len(FieldDelimiter) = 1 Then             strBlankRow = String(j_uBound - j_lBound, FieldDelimiter)         Else             For j = j_lBound To j_uBound                 strBlankRow = strBlankRow & FieldDelimiter             Next j         End If          Join2d = Replace(Join(arrTemp1, RowDelimiter), strBlankRow, RowDelimiter, "")         i = Len(strBlankRow & RowDelimiter)          If Left(Join2d, i) = strBlankRow & RowDelimiter Then             Mid$(Join2d, 1, i) = ""         End If      Else          Join2d = Join(arrTemp1, RowDelimiter)      End If      Erase arrTemp1  End Function 

Share and enjoy.

like image 136
Nigel Heffernan Avatar answered Sep 27 '22 18:09

Nigel Heffernan


Yes read it as a text file.

See this example

Option Explicit  Sub Sample()     Dim MyData As String, strData() As String      Open "C:\MyFile.CSV" For Binary As #1     MyData = Space$(LOF(1))     Get #1, , MyData     Close #1     strData() = Split(MyData, vbCrLf) End Sub 

FOLLOWUP

Like I mentioned below in the comments, AFAIK, there is no direct way of filling a 2d Array from a csv. You will have to use the code that I gave above and then split it per line and finally filling up a 2D array which can be cumbersome. Filling up a column is easy but if you specifically want say from Row 5 to Col 7 Data then it becomes cumbersome as you will have to check if there are sufficient columns/rows in the data. Here is a basic example to get Col B in a 2D Array.

NOTE: I have not done any error handling. I am sure you can take care of that.

Let's say our CSV File looks likes this.

enter image description here

When you run this code

Option Explicit  Const Delim As String = ","  Sub Sample()     Dim MyData As String, strData() As String, TmpAr() As String     Dim TwoDArray() As String     Dim i As Long, n As Long      Open "C:\Users\Siddharth Rout\Desktop\Sample.CSV" For Binary As #1     MyData = Space$(LOF(1))     Get #1, , MyData     Close #1     strData() = Split(MyData, vbCrLf)      n = 0      For i = LBound(strData) To UBound(strData)         If Len(Trim(strData(i))) <> 0 Then             TmpAr = Split(strData(i), Delim)             n = n + 1             ReDim Preserve TwoDArray(1, 1 To n)             '~~> TmpAr(1) : 1 for Col B, 0 would be A             TwoDArray(1, n) = TmpAr(1)         End If     Next i      For i = 1 To n         Debug.Print TwoDArray(1, i)     Next i End Sub 

You will get the output as shown below

enter image description here

BTW, I am curious that since you are doing this in Excel, why not use inbuilt Workbooks.Open or QueryTables method and then read the range into a 2D array? That would be much simpler...

like image 25
Siddharth Rout Avatar answered Sep 27 '22 19:09

Siddharth Rout


OK, after looking into this, the solution I have arived at is to use ADODB (requires reference to ActiveX Data Objects, this loads the csv file into array without cycling the rows columns. Does require the data to be in good condition.

Sub LoadCSVtoArray()

strPath = ThisWorkbook.Path & "\"

Set cn = CreateObject("ADODB.Connection")
strcon = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=" & strPath & ";Extended Properties=""text;HDR=Yes;FMT=Delimited"";"
cn.Open strcon
strSQL = "SELECT * FROM SAMPLE.csv;"

Dim rs As Recordset
Dim rsARR() As Variant

Set rs = cn.Execute(strSQL)
rsARR = WorksheetFunction.Transpose(rs.GetRows)
rs.Close
Set cn = Nothing

[a1].Resize(UBound(rsARR), UBound(Application.Transpose(rsARR))) = rsARR

End Sub
like image 21
Steve Avatar answered Sep 27 '22 20:09

Steve


To get a known format csv data file into a 2D array I finally adopted the following method, which seems to work well and is quite quick. I decided that file read operations are fairly fast nowadays, so I run a first pass on the csv file to get the size required for both dimension of the array. With the array suitably dimensioned it is then a simple task to re-read the file, line by line, and populate the array.

Function ImportTestData(ByRef srcFile As String, _
                        ByRef dataArr As Variant) _
                        As Boolean

Dim FSO As FileSystemObject, Fo As TextStream
Dim line As String, Arr As Variant
Dim lc As Long, cc As Long
Dim i As Long, j As Long

ImportTestData = False
Set FSO = CreateObject("Scripting.FilesystemObject")
Set Fo = FSO.OpenTextFile(srcFile)

' First pass; read the file to get array size
lc = 0 ' Counter for number of lines in the file
cc = 0 ' Counter for number of columns in the file
While Not Fo.AtEndOfStream  ' Read the csv file line by line
    line = Fo.ReadLine
    If lc = 0 Then ' Count commas to get array's 2nd dim index
        cc = 1 + Len(line) - Len(Replace(line, ",", ""))
    End If
    lc = lc + 1
Wend
Fo.Close

' Set array dimensions to accept file contents
ReDim dataArr(0 To lc - 1, 0 To cc - 1)
'Debug.Print "CSV has "; n; " rows with "; lc; " fields/row"
If lc > 1 And cc > 1 Then
    ImportTestData = True
End If

' Second pass; Re-open data file and copy to array
Set Fo = FSO.OpenTextFile(srcFile)
lc = 0
While Not Fo.AtEndOfStream
    line = Fo.ReadLine
    Arr = Split(line, ",")
    For i = 0 To UBound(Arr)
        dataArr(lc, i) = Arr(i)
    Next i
    lc = lc + 1
Wend

End Function   'ImportTestData()

I created this as a Function rather than a Sub to get a simple return value, if required. Reading a file with 8,500 rows of 20 columns takes approximately 180ms.
This method assumes that the structure (number of delimiters) of the CSV file is the same for every row, typical of a data logging application.

like image 21
NigelH Avatar answered Sep 27 '22 18:09

NigelH