Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to optimize vlookup for high search count ? (alternatives to VLOOKUP)

Tags:

I am looking for alternatives to vlookup, with improved performance within the context of interest.

The context is the following:

  • I have a data set of {key;data} which is big (~ 100'000 records)
  • I want to perform a lot of VLOOKUP operations on the dataset (typical use is to reorder the whole dataset)
  • My data set has no duplicate keys
  • I am looking only for exact matches (last argument to VLOOKUP is FALSE)

A schema to explain :

Reference sheet : ("sheet1")

        A           B      1      2  key1        data1      3  key2        data2      4  key3        data3    ...  ...         ...  99999  key99998    data99998 100000  key99999    data99999 100001  key100000   data100000 100002 

Lookup sheet:

        A           B      1      2  key51359    =VLOOKUP(A2;sheet1!$A$2:$B$100001;2;FALSE)      3  key41232    =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)      4  key10102    =VLOOKUP(A3;sheet1!$A$2:$B$100001;2;FALSE)    ...  ...         ...  99999  key4153     =VLOOKUP(A99999;sheet1!$A$2:$B$100001;2;FALSE) 100000  key12818    =VLOOKUP(A100000;sheet1!$A$2:$B$100001;2;FALSE) 100001  key35032    =VLOOKUP(A100001;sheet1!$A$2:$B$100001;2;FALSE) 100002 

On my Core i7 M 620 @2.67 GHz, this computes in ~10 minutes

Are there alternatives to VLOOKUP with better performance in this context ?

like image 794
d-stroyer Avatar asked Sep 06 '13 11:09

d-stroyer


People also ask

Is there a better alternative to VLOOKUP?

#1 – LOOKUP Function as VLOOKUP Alternatives The lookup function is better than VLOOKUP because it is less restrictive. It was first introduced in MS 2016. You can search data both vertically and horizontally. It also allows left to right and right to left lookup.

What are the prerequisites for using VLOOKUP what are the alternatives of VLOOKUP?

1) INDEX-MATCH: If you don't have an Office 365 subscription, INDEX-MATCH is your best alternative to VLOOKUP. INDEX formula provides you with the exact location of a cell in a range. By nesting the MATCH formula in INDEX, you can replace VLOOKUP in a much more robust way.

Is Xlookup or VLOOKUP more efficient?

For larger data sets, XLOOKUP is likely the better option for you due to its increased functionality. VLOOKUP has several limitations that make it difficult to use for complex data—and as a result, XLOOKUP has much more flexibility, making it the better choice for most spreadsheet users.

What is faster Countif or VLOOKUP?

The VLOOKUP can be faster as it only searches the first accordance of "this months ID" in "column of next months ID". COUNTIF must count all occurrences of "this months ID" in "column of next months ID". So it ever must loop over the whole "column of next months ID".


1 Answers

I considered the following alternatives:

  • VLOOKUP array-formula
  • MATCH / INDEX
  • VBA (using a dictionary)

The compared performance is:

  • VLOOKUP simple formula : ~10 minutes
  • VLOOKUP array-formula : ~10 minutes (1:1 performance index)
  • MATCH / INDEX : ~2 minutes (5:1 performance index)
  • VBA (using a dictionary) : ~6 seconds (100:1 performance index)

Using the same reference sheet

1) Lookup sheet: (vlookup array formula version)

         A          B      1      2   key51359    {=VLOOKUP(A2:A10001;sheet1!$A$2:$B$100001;2;FALSE)}      3   key41232    formula in B2      4   key10102    ... extends to    ...   ...         ...   99999   key4153     ... cell B100001 100000   key12818    ... (select whole range, and press 100001   key35032    ... CTRL+SHIFT+ENTER to make it an array formula) 100002 

2) Lookup sheet: (match+index version)

         A           B                                       C       1       2  key51359    =MATCH(A2;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B2)       3  key41232    =MATCH(A3;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B3)       4  key10102    =MATCH(A4;sheet1!$A$2:$A$100001;)       =INDEX(sheet1!$B$2:$B$100001;B4)     ...  ...         ...                                     ...   99999  key4153     =MATCH(A99999;sheet1!$A$2:$A$100001;)   =INDEX(sheet1!$B$2:$B$100001;B99999)  100000  key12818    =MATCH(A100000;sheet1!$A$2:$A$100001;)  =INDEX(sheet1!$B$2:$B$100001;B100000)  100001  key35032    =MATCH(A100001;sheet1!$A$2:$A$100001;)  =INDEX(sheet1!$B$2:$B$100001;B100001)  100002 

3) Lookup sheet: (vbalookup version)

       A          B      1      2  key51359    {=vbalookup(A2:A50001;sheet1!$A$2:$B$100001;2)}      3  key41232    formula in B2      4  key10102    ... extends to    ...  ...         ...  50000  key91021    ...   50001  key42       ... cell B50001  50002  key21873    {=vbalookup(A50002:A100001;sheet1!$A$2:$B$100001;2)}  50003  key31415    formula in B50001 extends to    ...  ...         ...  99999  key4153     ... cell B100001 100000  key12818    ... (select whole range, and press 100001  key35032    ... CTRL+SHIFT+ENTER to make it an array formula) 100002 

NB : For some (external internal) reason, the vbalookup fails to return more than 65536 data at a time. So I had to split the array formula in two.

and the associated VBA code :

Function vbalookup(lookupRange As Range, refRange As Range, dataCol As Long) As Variant   Dim dict As New Scripting.Dictionary   Dim myRow As Range   Dim I As Long, J As Long   Dim vResults() As Variant    ' 1. Build a dictionnary   For Each myRow In refRange.Columns(1).Cells     ' Append A : B to dictionnary     dict.Add myRow.Value, myRow.Offset(0, dataCol - 1).Value   Next myRow    ' 2. Use it over all lookup data   ReDim vResults(1 To lookupRange.Rows.Count, 1 To lookupRange.Columns.Count) As Variant   For I = 1 To lookupRange.Rows.Count     For J = 1 To lookupRange.Columns.Count       If dict.Exists(lookupRange.Cells(I, J).Value) Then         vResults(I, J) = dict(lookupRange.Cells(I, J).Value)       End If     Next J   Next I    vbalookup = vResults End Function 

NB: Scripting.Dictionary requires a referenc to Microsoft Scripting Runtime which must be added manually (Tools->References menu in the Excel VBA window)

Conclusion :

In this context, VBA using a dictionary is 100x faster than using VLOOKUP and 20x faster than MATCH/INDEX

like image 162
d-stroyer Avatar answered Oct 18 '22 15:10

d-stroyer