I'm looking for a set data structure to use in Excel VBA. What I found so far is Scripting.Dictionary which seems to be a map.
Is there also something like a set in VBA?
Basically I'm looking for a data structure that is efficient for finding out if a particular value has already been added.
You could use a collection and do the following function, collections enforce unique key identifiers:
Public Function InCollection(Col As Collection, key As String) As Boolean
Dim var As Variant
Dim errNumber As Long
InCollection = False
Set var = Nothing
Err.clear
On Error Resume Next
var = Col.Item(key)
errNumber = CLng(Err.Number)
On Error GoTo 0
'5 is not in, 0 and 438 represent incollection
If errNumber = 5 Then ' it is 5 if not in collection
InCollection = False
Else
InCollection = True
End If
End Function
Simply write a wrapper for Scripting.Dictionary
that exposes only set-like operations.
clsSet
Option Explicit
Private d As Scripting.Dictionary
Private Sub Class_Initialize()
Set d = New Scripting.Dictionary
End Sub
Public Sub Add(var As Variant)
d.Add var, 0
End Sub
Public Function Exists(var As Variant) As Boolean
Exists = d.Exists(var)
End Function
Public Sub Remove(var As Variant)
d.Remove var
End Sub
And then you can use it like so:
mdlMain
Public Sub Main()
Dim s As clsSet
Set s = New clsSet
Dim obj As Object
s.Add "A"
s.Add 3
s.Add #1/19/2017#
Debug.Print s.Exists("A")
Debug.Print s.Exists("B")
s.Remove #1/19/2017#
Debug.Print s.Exists(#1/19/2017#)
End Sub
Which prints True, False and False as expected.
Take a look at .NET ArrayList, it has such methods as Add
, Contains
, Sort
etc. You can instantiate the object within VBS and VBA environment:
Set ArrayList = CreateObject("System.Collections.ArrayList")
Scripting.Dictionary
also may fit the needs, it has unique keys, Exists
method allows to check if a key is already in the dictionary.
However, SQL request via ADODB probably will be more efficient for that case. The below examples shows how to retrieve unique rows via SQL query to the worksheet:
Option Explicit
Sub GetDistinctRecords()
Dim strConnection As String
Dim strQuery As String
Dim objConnection As Object
Dim objRecordSet As Object
Select Case LCase(Mid(ThisWorkbook.Name, InStrRev(ThisWorkbook.Name, ".")))
Case ".xls"
strConnection = "Provider=Microsoft.Jet.OLEDB.4.0;User ID=Admin;Data Source='" & ThisWorkbook.FullName & "';Mode=Read;Extended Properties=""Excel 8.0;HDR=YES;"";"
Case ".xlsm", ".xlsb"
strConnection = "Provider=Microsoft.ACE.OLEDB.12.0;User ID=Admin;Data Source='" & ThisWorkbook.FullName & "';Mode=Read;Extended Properties=""Excel 12.0 Macro;HDR=YES;"";"
End Select
strQuery = "SELECT DISTINCT * FROM [Sheet1$]"
Set objConnection = CreateObject("ADODB.Connection")
objConnection.Open strConnection
Set objRecordSet = objConnection.Execute(strQuery)
RecordSetToWorksheet Sheets(2), objRecordSet
objConnection.Close
End Sub
Sub RecordSetToWorksheet(objSheet As Worksheet, objRecordSet As Object)
Dim i As Long
With objSheet
.Cells.Delete
For i = 1 To objRecordSet.Fields.Count
.Cells(1, i).Value = objRecordSet.Fields(i - 1).Name
Next
.Cells(2, 1).CopyFromRecordset objRecordSet
.Cells.Columns.AutoFit
End With
End Sub
Source data should be placed on the Sheet1
, the result is output to the Sheet2
. The only limitation for that method is that ADODB connects to the Excel workbook on the drive, so any changes should be saved before query to get actual results.
If you want to get only the set of non-distinct rows, then the query should be as follows (just an example, you have to put your set of fields into query):
strQuery = "SELECT CustomerID, CustomerName, ContactName, Address, City, PostalCode, Country FROM [Sheet1$] GROUP BY CustomerID, CustomerName, ContactName, Address, City, PostalCode, Country HAVING Count(*) > 1"
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With