Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Algorithm for covering memory ranges?

I have a set of ranges such as {(2-8), (13-22), (380-7931), (40032-63278)}. For the sake of simplicity we may assume that they do not overlap (overlapping ranges have already been merged).

My goal is to "cover" these ranges using combinations of a given set of "lengths" such as {4, 64, 1024, 16384}. I am constrained to use at most N lengths, such as N=32. I do not care how many "lengths" I use as long as I'm below my maximum, but I want to minimize total "extra" area - numbers "covered" by lengths that is not in the initial set of ranges.

Example {(2-8)} covered by (2-66) (one length of 64 used) has 58 "extra" numbers. {(2-8)} covered by {(2-6), (6-10)} (two lengths of 4) has only 2 "extra" numbers and is preferable.

My real world application involves pre-programming a fixed MMU TLB to ensure that only certain ranges of memory addresses are accessible (TLB misses thus represent a forbidden access & can be dealt with accordingly). I'd like to cover the ranges as tightly as possible so violations are caught sooner rather than later, but I only have 32 slots to work with and 4 fixed page sizes. I can tune my code by hand to an adequate level of performance, but I'm curious if there's a more elegant/general purpose solution to the problem. It seems related to the knapsack problem but different enough it's been difficult to search for.

like image 742
laughingcoyote Avatar asked Sep 24 '12 15:09

laughingcoyote


1 Answers

This may be formulated as variation of Shortest path problem.

We need to cover a set of ranges of total length M with at most N pages. Pages may have L different lengths, they are not aligned (may be placed at any address). Difference between total "extra" area and total length of pages is equal to the constant M, which allows to minimize total length of pages.

Let's construct a graph, related to this problem. Each memory address in any range has corresponding vertex in the graph. Each vertex has L outgoing edges, corresponding to L pages, starting at given address. Length of each edge is equal to page length. Each edge comes to some vertex in the graph, depending on where corresponding page ends:

  1. Page ends in some unoccupied position. Edge comes to the vertex, corresponding to starting address of the first range after this position.
  2. Page ends inside some range. Edge comes to the vertex, corresponding to end address of the page.
  3. Page ends in unoccupied position, with address, larger than address of any range. Edge comes to destination vertex. (Starting address of the very first range corresponds to source vertex, and we should find shortest path between these two vertices).

Since the resulting graph is a DAG, shortest path may be found in linear time, by processing the vertices in a topological order (or, even simpler, in order of corresponding memory addresses).

For each vertex keep an array of N pairs {path-length, back-pointer}. While shortest path algorithm visits any vertex, index this array with the number of hops in the path, and if the path is shorter than stored path-length, replace {path-length, back-pointer}. When every vertex is processed, find shortest path in the array of pairs, belonging to destination vertex, then reconstruct path using back-pointers. This path describes optimal cover.

Worst case time complexity O(L*M*N) is determined by maximal number of edges (L*M) and number of elements in the array, belonging to each vertex (N). If ranges are sparse, most edges come to starting address of some range, most vertices, corresponding to internal addresses are unused, and time complexity is much less.

This algorithm requires O(M*N) space, but for sparse ranges this may be significantly decreased if we put all the graph vertices (or maybe all length/pointer pairs) into a hash map.

like image 152
Evgeny Kluev Avatar answered Oct 07 '22 05:10

Evgeny Kluev