Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure / Algorithm for section lookup

Tags:

algorithm

What kind of datastructure / algorithm should I use for looking up which section currently in, given a list of endpoints of each section?

For example, if I have a webpage with section headers and content,

  • Introduction (ends at 100px)
  • Section 1 (ends at 350px)
  • Section 2 (ends at 700px)
  • Conclusion (ends at 1200px)
  • comments

and I'm currently at 130px, it should return that I'm currently in "Section 1".

Option 1

Binary search through array of endpoints

from bisect import bisect_left

arr = [100, 350, 700, 1200]
pos = bisect_left(arr, 130, 0, arr[-1])

However, this could still take O(log n) for every change in position.

Option 2

Hash table lookup of current location,

lookup = {0: "Introduction"
          1: "Introduction"
          ...
          10: "Section 1"
          11: "Section 1"
          ...
         }
section = lookup[130/10]

This is fast, but it wastes a lot of space


Is there a general data structure / algorithm that deals with this type of problem?

like image 609
Lee Chou Avatar asked Oct 31 '22 17:10

Lee Chou


1 Answers

I like your first option, binary searches are very efficient for scanning and as you say the second option is not space efficient.

The traditional and very generic solution that scales in computer graphics is a 2d k-tree, which creates a tree that can be looked up by coordinates without wasting memory. Specifically its search, removal and insert complexities are all O(log n) and its space complexity is O(n).

Given that you are only doing one axis though and a web page will tend to have 1-100 sections (and is unlikely to have thousands, let alone millions, or billions of sections) then I would personally consider going with a very simple array and then move to a more complex k-tree when there is a measurable benefit/need. If you are writing this in C or another language that gives you some control on the memory layout, then an array of structures would be exceptionally fast to scan due to the design of modern cpus and memory hierarchies (specifically prefetchers and caching).

like image 52
Chris K Avatar answered Nov 08 '22 04:11

Chris K