Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Mysterious use of LOOKUP

I've been using this Excel formula for years. It works fine, but I want to understand why it works! The formula is used to find the last associated value on a list. For instance:

Macro usage example.

Formula in C14: =LOOKUP(2,1/(B1:B12="meat"),C1:C12)

Here, the formula in C14 looks in column B for the latest "meat" labeled cell and returns the associated value in the C column. It finds "meat" at B9 and correspondingly returns the value at C9.

The most puzzling part of the formula is "1/(....)". What is this division? Where does this syntax come from? Can it be used elsewhere? Why is the lookup value 2?

like image 943
Guillermo Prandi Avatar asked Sep 21 '15 19:09

Guillermo Prandi


People also ask

What is a LOOKUP used for?

Use LOOKUP, one of the lookup and reference functions, when you need to look in a single row or column and find a value from the same position in a second row or column. For example, let's say you know the part number for an auto part, but you don't know the price.

How does the LOOKUP function work in Excel?

Lookup functions in Excel mean referencing a cell to match values in another row or column against the cell and thereby retrieving the corresponding results from the respective rows and columns.

What is LOOKUP value in Excel?

Lookup functions in Excel are used for looking through a single column or row to find a particular value from the same place in a second column or row. This often takes place when there are multiple worksheets within a workbook or a large amount of data in a worksheet.


2 Answers

Here's what's happening: This

=LOOKUP(2,1/(B1:B12="meat"),C1:C12)

becomes this

=LOOKUP(2,1/{TRUE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;FALSE;TRUE;FALSE;FALSE;FALSE},C1:C12)

which becomes this

=LOOKUP(2,{1;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;#DIV/0!;1;#DIV/0!;#DIV/0!;#DIV/0!},C1:C12)

The B1:B12="meat" part is evaluated as an array of TRUE and FALSE values. When you do math on TRUE or FALSE it turns into 1 or 0. Dividing 1 by 1 or 0 returns 1 for all the trues and div/0 for all the falses.

Now that you know you're going to have an array full of exactly two things (either a 1 or a div/0), you can LOOKUP any number that's greater than 1 and it will return the last 1 in the list. You could =LOOKUP(800,...) and it will still return the "largest value that's less than or equal to the look up value", or in your case 1.

So the two is somewhat arbitrary - it's just a number that's greater than 1. The crux of it is to create the lookup array argument that includes only 1s and errors - LOOKUP ignores the errors.

Binary Search

I don't have official documentation on this, so here's the unofficial version. LOOKUP, HLOOKUP, and VLOOKUP have an argument where you tell the function if the data is sorted. If that argument is False, the function looks at every entry all the way until the end. If that argument is True or omitted, the function uses a binary search. It does that - the argument only exists - because a binary search is way faster than the ol' one-at-a-time search.

A binary search works by finding the middle value and evaluating it against the sought value. If the middle value is larger, everything to the right is discarded - the sought value must be to the left. It takes the left half and finds the middle value of that. If it's larger, it discards the right and keeps the left. Keep iterating until you find the value.

So why does LOOKUP(2,{1,1,#DIV/0!,1,1,#DIV/0!,1,1},...) find the last 1 instead of some arbitrary 1? I don't know exactly how MS implemented their binary search, so I have to make some assumptions. My assumptions are that error values are thrown out of the array and that when there's an even number of entries (no middle value), the value on the left of middle is used. Those assumptions may be wrong, but they have zero affect on the outcome.

First, throw out the errors. Now you have these entries and their original positions

1  1  1  1  1  1
1  2  4  5  7  8

The "middle" number is 1(4). It is less than 2 (sought value), so we throw out everything to the left and reprocess the right. Now we have

1  1  1
5  7  8

The middle value is 1(7). It is less than 2, so we throw out everything to the left and reprocess the right. Now we have

1
8

We're down to one entry, so that's the answer. If the sought value is higher than all the other values, a binary search will always return the last value.

like image 107
Dick Kusleika Avatar answered Sep 21 '22 14:09

Dick Kusleika


The logic goes like this:

  • B1:B12="meat" returns an array of booleans, TRUE where "meat" and FALSE otherwise
  • 1/TRUE returns 1, while 1/FALSE = 1/0 returns #DIV/0!, so you get an array filled with 1's and #DIV/0!'s as illustrated below. (This step is used to convert FALSE into an error value, for use in the next step.)

enter image description here

  • Doing a LOOKUP for the value 2 in that array fails, because there is no 2 there, and because the values are unsorted. So instead, LOOKUP looks for the last occurrence of a numerical value smaller than 2, ignoring any #DIV/0! error values. Why? Who knows. It is an undocumented feature, not mentioned in the documentation for the LOOKUP function

So essentially, this shenanigan relies on undocumented behaviour to return the desired result.

like image 38
Jean-François Corbett Avatar answered Sep 22 '22 14:09

Jean-François Corbett