Python implementation of the Sequitur algorithm?

Question

I need to use the Sequitur algorithm for a bioinformatics project but so far I haven't found any Python implementations of the algorithm. This is what I've got so far:

    def new_rule(sequence,rules,x,y,z):
    y = list(y)
    if max(y) > 1:
        ind = y.index(max(y))

        locations = np.zeros(max(y))

        counter = 0
        for i, j in enumerate(z):
            if j == x[ind]:
                locations[counter] = i
                counter+=1

    new_rule = 'A'+str(len(rules)+1)
    rules.append([new_rule,x[ind]])

    return sequence.replace(x[ind],new_rule), rules

def word_count(sequence):
    return len(sequence)-sequence.count(' ')

def recursive_sequitur(sequence,rules):
    count_down = word_count(sequence)-1

x,y,z = n_grams(sequence,count_down)

while count_down>1:
    if len(y) > 1 and max(y) > 1:
        sequence, rules = new_rule(sequence,rules,x,y,z)
        x,y,z = n_grams(sequence,count_down)

    else:
        count_down-=1
        x,y,z = n_grams(sequence,count_down)

return sequence, rules

MarkokraM · Accepted Answer

Have you seen Parallel Sequitur for Python?

We implemented serial and parallel versions of the Sequitur compression algorithm. The Sequitur algorithm uses hierarchical structure and sequences of discrete symbols to compress files by exploiting repetative structures found in strings.

concat_parallel.py: simple parallel implementation of the Sequitur algorithm. Text is split among workers which run a version of the serial algorithm before the main process merges all results in order.

Python implementation of the Sequitur algorithm?

Tags:

compression

sequences

1 Answers

MarkokraM

Recent Activity

Donate For Us

Python implementation of the Sequitur algorithm?

Tags:

compression

sequences

1 Answers

MarkokraM

Related questions

Recent Activity

Donate For Us