Which programs/algorithms change the representation of their data structure at runtime in order to obtain beter performance?
Context: Data structures "define" how real-world concepts are structured and represented in computer memory. For different kinds of computations a different data structure should/can be used to achieve acceptable performance (e.g., linked-list versus array implementation).
Self-adaptive (cf. self-updating) data structures are data structures that change their internal state according to a concrete usage pattern (e.g., self balancing trees). These changes are internal, i.e., depending the data. Moreover, these changes are anticipated upon by design.
Other algorithms can benefit from an external change of representation. In matrix-multiplication, for instance, it is a well know performance trick to transpose "the second matrix" (such that caches are used more efficiently). This is actually changing the matrix representation from row-major to column major order. Because "A" is not the same as "Transposed(A)", the the second matrix is transposed again after the multiplication to keep the program semantically correct.
A second example is using a linked-list at program start-up to populate "the data structure" and change to an array based implementation once the content of the list becomes "stable".
I am looking for programmers that have similar experiences with other example programs where an external change of representation is performed in their application in order to have better performance. Thus, where the representation (chosen implementation) of a data structure is changed at runtime as an explicit part of the program.
In Dynamic data structure the size of the structure in not fixed and can be modified during the operations performed on it. Dynamic data structures are designed to facilitate change of data structures in the run time.
Also, a data structure can hold different kinds and types of data within one single object. Data type examples are int, float, double, etc. Data structure examples are stack, queue, tree, etc.
self-updating) data structures are data structures that change their internal state according to a concrete usage pattern (e.g., self balancing trees). These changes are internal, i.e., depending the data. Moreover, these changes are anticipated upon by design.
Arrays. An array is the simplest and most widely used data structure. Other data structures like stacks and queues are derived from arrays.
The pattern of transforming the input representation in order to enable a more efficient algorithm comes up in many situations. I would go as far as to say this is an important way to think about designing efficient algorithms in general. Some examples that come to mind:
HeapSort. It works by transforming your original input list into a binary heap (probably a min-heap), and then repeatedly calling the remove-min function to get the list elements in sorted order. Asymptotically, it is tied for the fastest comparison-based sorting algorithm.
Finding duplicates in a list. Without changing the input list, this will take O(n^2) time. But if you can sort the list, or store the elements in a hash table or Bloom filter, you can find all the duplicates in O(n log n) time or better.
Solving a linear program. A linear program (LP) is a certain kind of optimization problem with many applications in economics and elsewhere. One of the most important techniques in solving LPs is duality, which means converting your original LP into what is called the "dual", and then solving the dual. Depending on your situation, solving the dual problem may be much easier than solving the original ("primal") LP. This book chapter starts with a nice example of primal/dual LPs.
Multiplying very large integers or polynomials. The fastest known method is using the FFT; see here or here for some nice descriptions. The gist of the idea is to convert from the usual representation of your polynomial (a list of coefficients) to an evaluation basis (a list of evaluations of that polynomial at certain carefully-chosen points). The evaluation basis makes multiplication trivial - you can just multiply each pair of evaluations. Now you have the product polynomial in an evaluation basis, and you interpolate (opposite of evaluation) to get back the coefficients, like you wanted. The Fast Fourier Transform (FFT) is a very efficient way of doing the evaluation and interpolation steps, and the whole thing can be much faster than working with the coefficients directly.
Longest common substring. If you want to find the longest substring that appears in a bunch of text documents, one of the fastest ways is to create a suffix tree from each one, then merge them together and find the deepest common node.
Linear algebra. Various matrix computations are performed most efficiently by converting your original matrix into a canonical form such as Hermite normal form or computing a QR factorization. These alternate representations of the matrix make standard things such as finding the inverse, determinant, or eigenvalues much faster to compute.
There are certainly many examples besides these, but I was trying to come up with some variety.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With