Is there a faster way to do this?
Set data = ws.UsedRange Set unique = CreateObject("Scripting.Dictionary") On Error Resume Next For x = 1 To data.Rows.Count unique.Add data(x, some_column_number).Value, 1 Next x On Error GoTo 0
At this point unique.keys
gets what I need, but the loop itself seems to be very slow for files that have tens of thousands of records (whereas this wouldn't be a problem at all in a language like Python or C++ especially).
In Excel, there are several ways to filter for unique values—or remove duplicate values: To filter for unique values, click Data > Sort & Filter > Advanced. To remove duplicate values, click Data > Data Tools > Remove Duplicates.
VBA Remove duplicates – Example #1Step 1: Start the subprocedure by giving a macro code a name. Step 2: Mention the range of data by using the VBA Range object. Step 3: After mentioning the range access VBA “RemoveDuplicates” method. Step 4: First argument in which column we need to remove the duplicate values.
Select Text option from the Formula Type drop down list; Then choose Extract cells with unique values (include the first duplicate) from the Choose a fromula list box; In the right Arguments input section, select a list of cells that you want to extract unique values.
Use Excel's AdvancedFilter function to do this.
Using Excels inbuilt C++ is the fastest way with smaller datasets, using the dictionary is faster for larger datasets. For example:
Copy values in Column A and insert the unique values in column B:
Range("A1:A6").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("B1"), Unique:=True
It works with multiple columns too:
Range("A1:B4").AdvancedFilter Action:=xlFilterCopy, CopyToRange:=Range("D1:E1"), Unique:=True
Be careful with multiple columns as it doesn't always work as expected. In those cases I resort to removing duplicates which works by choosing a selection of columns to base uniqueness. Ref: MSDN - Find and remove duplicates
Here I remove duplicate columns based on the third column:
Range("A1:C4").RemoveDuplicates Columns:=3, Header:=xlNo
Here I remove duplicate columns based on the second and third column:
Range("A1:C4").RemoveDuplicates Columns:=Array(2, 3), Header:=xlNo
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With