I have a set of input sequences (represented as lists), for each of which I'm generating a set of all of its subsequences (also lists). These subsequences are stored as keys in an EQUAL hash-table, and never garbage collected or modified (the associated values, however, are modified).
My problem is that the method I'm currently using to accomplish this uses far more heap space than I'd like.
To be clear about what I'm talking about, suppose we have a sequence:
#1=(A B C D)
The subsequences are:
#2=()
#3=(A)
#4=(A B)
#5=(A B C)
#6=(A B C D)
#7=(B)
#8=(B C)
#9=(B C D)
#10=(C)
#11=(C D)
#12=(D)
The following code, which I admit isn't very good, produces this set (except the empty subsequence which I don't actually care about):
(defun subseqs-of-length (length sequence)
(if (< (length sequence) length)
nil
(loop for start from 0 to (- (length sequence) length)
collect (subseq sequence start (+ start length)))))
(defun subseqs-of-length-< (length sequence)
(loop for len from 1 to (1- length)
append (subseqs-of-length len sequence)))
(defun all-subseqs (sequence)
(subseqs-of-length-< (1+ (length sequence)) sequence))
As you can probably imagine, with a lot of input sequences, this uses up a huge amount of heap.
The most obvious way I can think of to save memory is to share a bunch of list structure; for instance, you can build list 11 by (cons 'c '#12#)
, list 9 by (cons 'b '#11#)
, list 8 by (cons 'b '#10#
and so forth. It would be even better if the output of multiple calls to all-subseqs
could share structure. But I can't think of an elegant way to write this.
My question is two-fold:
all-subseqs
so that its results will share structure that way?What is the minimal amount of information needed to reconstruct a subsequence?
How about the sequence, a start index, and end index. Store the start/end indexes in a two-dimensional hash table (a hash table of hash tables), or a 2-d array. Then write functions to do work on that data structure (reconstruct the subsequence given the sequence, a start index, and end index, etc).
Another idea: If you really want to store the subsequences in the data structure, you could use an N-dimensional hash table (a hash table of hash tables of hash tables ...), where N is the length of the sequence. Instead of a linked list of elements, you get a linked tree of hash tables. There is a special key in each hash table, storing the actual value of the keyed sequence all the way up to that point (that is, if this keyed sequence actually has a value). And other links in each hash table point to branches further down the tree.
And if each element in the list is a symbol, then an 'a in one list will be eq (share same mem location) as an 'a in another list. Which I think means you can get O(N) storage growth + whatever incremental growth occurs by adding links to the data structure.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With