I have this UDF
which I use to lookup dates and return values based on the condition.
Basically two(2) conditions only, either <
or >
the date.
Also, I just use built-in Excel Functions as well and just added some conditions.
Public Function CLOOKUP(lookup_value, table_array As Range, column_index As Long, _
rv_operator, reference_value, Optional range_lookup, _
Optional return_index) As Variant
Dim NT_array, S_array
Dim ORGLOOKUP, REFLOOKUP
Dim row_count As Long, row_less As Long
With Application.WorksheetFunction
If column_index > 0 And column_index <= table_array.Columns.Count Then
On Error Resume Next
ORGLOOKUP = .VLookup(lookup_value, table_array, column_index, range_lookup)
If Err.number <> 0 Then CLOOKUP = CVErr(xlErrNA): Exit Function
On Error GoTo 0
Select Case rv_operator
Case "<"
Do While ORGLOOKUP > reference_value
Set NT_array = table_array.Resize(, 1)
row_count = .CountA(NT_array)
Set S_array = table_array.Resize(row_count)
row_less = .Match(lookup_value, NT_array, 0)
Set table_array = S_array.Offset(row_less, 0).Resize(row_count - row_less)
On Error Resume Next
ORGLOOKUP = .VLookup(lookup_value, table_array, column_index, range_lookup)
If Err.number <> 0 Then CLOOKUP = CVErr(xlErrNA): Exit Function
On Error GoTo 0
Loop
Case ">"
Do While ORGLOOKUP < reference_value
Set NT_array = table_array.Resize(, 1)
row_count = .CountA(NT_array)
Set S_array = table_array.Resize(row_count)
row_less = .Match(lookup_value, NT_array, 0)
Set table_array = S_array.Offset(row_less, 0).Resize(row_count - row_less)
On Error Resume Next
ORGLOOKUP = .VLookup(lookup_value, table_array, column_index, range_lookup)
If Err.number <> 0 Then CLOOKUP = CVErr(xlErrNA): Exit Function
On Error GoTo 0
Loop
Case Else
CLOOKUP = CVErr(xlErrNA)
End Select
Select Case True
Case IsMissing(return_index)
CLOOKUP = ORGLOOKUP
Case Else
If return_index <= table_array.Columns.Count Then
REFLOOKUP = .VLookup(lookup_value, table_array, return_index, range_lookup)
CLOOKUP = REFLOOKUP
Else
CLOOKUP = CVErr(xlErrNA)
End If
End Select
Else
CLOOKUP = CVErr(xlErrNA)
End If
End With
End Function
It works fine but I want to optimize it a bit to improve calculation speed.
Usually I'm using this to lookup 10k rows in an excel file with 600k or more rows.
It takes 5~8 mins in a sorted data.
If someone can point me to the right direction on how to optimize this function, that would be great.
Edit1:
HERE is the workbook link.
Two(2) Sheets, Data Source and Data for Lookup, self-explanatory I guess.
I also included the function in the WB.
I used the function to populate the values on the Data for Lookup Sheet under the Manufacturing Date column and just leave the first cell with the actual formula to avoid problems in opening it.
For those who are not keen, here's the syntax on how to use the function:
lookup_value - what you are looking for
table_array - where you're looking
column_index - column from which you want to get information based on your lookup_value
rv_operator - criteria whether the value to return is less than or greater than the reference_value
reference_value - where your returned value is compared
range_lookup - exact or approximate match
return_index - alternative column index, just in case you need to return data aside from what get from the column_index
Remember that I use this to get DATES
so the column_index
always contains a date as well as the reference_value
.
That is the reason there is a return_index
since I may need to recover information that fall under the conditions but not actually interested in the dates.
So for example in my sample workbook, I need to get the manufacturing date of serial number 096364139403422056
but it shoud be less than the reference value 1/4/2014
.
There are more than one occurence of this serial number so I need to get the closest to the reference value.
The result should be 11/15/2013
using the function: =CLOOKUP(B2,'Source Data'!A:B,2,"<",A2,0)
Hope above explanation helps you guys a bit.
Btw, this can also be achieved using Array Formulas
.
I just made this formula for the benefit of the other users who are not well versed with AF's
.
I have created a solution that takes about 40 seconds in my laptop. My laptop takes about 7 minutes to copy the formula to all lookup rows.
When I measure the various bottlenecks in the original UDF, I found out that VLOOKUP is very expensive. Example using a row close to the bottom:
Since you can potentially call the above functions multiple times (when there is a duplicate), it is even more time consuming.
My solution is using a VBA macro instead of optimizing the UDF. Also, instead of using VLOOKUP, I am using the Scripting.Dictionary object to store all the serial numbers. Lookup using Scripting.Dictionary is 100 times faster according to How to optimize vlookup for high search count ? (alternatives to VLOOKUP).
I tested it on Office 2010 running on Windows 7. Loading all the serial numbers into the Dictionary takes about 37 seconds, while the lookup and populating column C takes about 3 seconds ! Therefore, it's not a problem at all to have more rows in the lookup worksheet !
If the macro complains when creating Scripting.Dictionary, you may need to add a reference to Microsoft Scripting Runtime (see the above link for details).
When I compare the result with your UDF formula, I found some inconsistency which may be due to a bug in your UDF code. For example:
In row 12739, serial number 096364139401213204, the reference date is 1/13/2013, the data is 1/3/2013 and 4/23/2013, but the result is #VALUE! So it looks like if ANY of the data is GREATER than the reference date, you want the result to be #VALUE!
HOWEVER, in row 12779, serial number 096364139508732708, the reference date is 1/9/2013, the data is 8/10/2013 and 1/2/2013, your UDF produces 1/2/2013 instead of #VALUE! even though there is a row with Mfg date greater than the reference date.
I don't know what behavior you want, so I assume you want to display #VALUE! when ANY of the data is greater than the reference date. If you want to change the behavior, please let me know, or update the code yourselves (I put copious comment in the code).
Here's a link to download the spreadsheet plus the Macro to: https://www.dropbox.com/s/djqvu0a4a6h5a06/Sample%20Workbook%20Optimized.xlsm . I'm going to make it available only for 1 week. The Macro code is below:
Option Explicit
Sub Macro1()
'
' Macro1 Macro
'
Const COMPARISONMODE = "<"
Const SOURCESHEETNAME = "Source Data"
Const LOOKUPSHEETNAME = "Data for Lookup"
Dim oSource
Set oSource = CreateObject("Scripting.Dictionary")
Dim starttime, endtime, totalindex
'BUILD THE INDEX in oSource
'Column A = serial number
'Column B = mfg date
'Column C = extra data
'Each item contains a comma separated list of row numbers
starttime = Timer
Sheets(SOURCESHEETNAME).Activate
Dim rownum, serialno, mfgdate
rownum = 2
Do
serialno = Cells(rownum, 1)
If Not IsError(serialno) Then
serialno = CStr(serialno)
If serialno = "" Then Exit Do
If oSource.Exists(serialno) Then
oSource(serialno) = oSource(serialno) & "," & rownum
Else
oSource.Add serialno, CStr(rownum)
End If
End If
rownum = rownum + 1
Loop
endtime = Timer
totalindex = endtime - starttime
starttime = Timer
'DO THE LOOKUP
'NOTE: Assume that there are no #VALUE! in columns A and B of the lookup table
Dim rownumlist, sourcerownum, aryRownumlist, refdate, closestmfgdate, closestextradata, j
Sheets(LOOKUPSHEETNAME).Activate
rownum = 2
Do
refdate = CDate(Cells(rownum, 1))
serialno = Cells(rownum, 2)
If serialno = "" Then Exit Do
If Not oSource.Exists(serialno) Then
Cells(rownum, 3) = CVErr(xlErrNA)
GoTo ContinueLoop
End If
aryRownumlist = Split(oSource(serialno), ",")
closestmfgdate = ""
closestextradata = ""
'Find the closest manufacturing date to the reference date out of all matches
For j = LBound(aryRownumlist) To UBound(aryRownumlist)
sourcerownum = CLng(aryRownumlist(j))
mfgdate = Sheets(SOURCESHEETNAME).Cells(sourcerownum, 2)
If IsError(mfgdate) Then Exit For 'if any of the date in the matches is not valid, output N/A
mfgdate = CDate(mfgdate)
'Exclude depending on COMPARISONMODE
'must be less than the reference date if COMPARISONMODE = "<", otherwise it has to be greater than
'If comparison failed for ANY of the matches, we will output N/A
'If you want the failed comparison match to be excluded but still output a date, instead of doing
' Exit For, you can do Goto ContinueFor. Example:
' If mfgdate >= refdate Then Goto ContinueFor
'QUESTION: What to do if it is equal? Assume that we will output N/A as well
If COMPARISONMODE = "<" Then
If mfgdate >= refdate Then closestmfgdate = "": Exit For
Else
If mfgdate <= refdate Then closestmfgdate = "": Exit For
End If
'Now check whether it is closer to refdate
If closestmfgdate = "" Then
closestmfgdate = mfgdate
closestextradata = Sheets(SOURCESHEETNAME).Cells(sourcerownum, 3)
ElseIf Abs(DateDiff("d", closestmfgdate, refdate)) > Abs(DateDiff("d", mfgdate, refdate)) Then
closestmfgdate = mfgdate
closestextradata = Sheets(SOURCESHEETNAME).Cells(sourcerownum, 3)
End If
ContinueFor:
Next
If closestmfgdate = "" Then
Cells(rownum, 3) = CVErr(xlErrNA)
Cells(rownum, 4) = ""
Else
Cells(rownum, 3) = closestmfgdate
Cells(rownum, 4) = closestextradata
End If
ContinueLoop:
rownum = rownum + 1
Loop
endtime = Timer
MsgBox "Indexing time=" & totalindex & " seconds; lookup time=" & (endtime - starttime) & " seconds"
End Sub
If you find the above solution satisfactory, please award the bounty or at least accept the solution. Thanks.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With