Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python implementation of the Sequitur algorithm?

I need to use the Sequitur algorithm for a bioinformatics project but so far I haven't found any Python implementations of the algorithm. This is what I've got so far:

    def new_rule(sequence,rules,x,y,z):
    y = list(y)
    if max(y) > 1:
        ind = y.index(max(y))

        locations = np.zeros(max(y))

        counter = 0
        for i, j in enumerate(z):
            if j == x[ind]:
                locations[counter] = i
                counter+=1

    new_rule = 'A'+str(len(rules)+1)
    rules.append([new_rule,x[ind]])

    return sequence.replace(x[ind],new_rule), rules

def word_count(sequence):
    return len(sequence)-sequence.count(' ')

def recursive_sequitur(sequence,rules):
    count_down = word_count(sequence)-1

x,y,z = n_grams(sequence,count_down)

while count_down>1:
    if len(y) > 1 and max(y) > 1:
        sequence, rules = new_rule(sequence,rules,x,y,z)
        x,y,z = n_grams(sequence,count_down)

    else:
        count_down-=1
        x,y,z = n_grams(sequence,count_down)

return sequence, rules

1 Answers

Have you seen Parallel Sequitur for Python?

We implemented serial and parallel versions of the Sequitur compression algorithm. The Sequitur algorithm uses hierarchical structure and sequences of discrete symbols to compress files by exploiting repetative structures found in strings.

concat_parallel.py: simple parallel implementation of the Sequitur algorithm. Text is split among workers which run a version of the serial algorithm before the main process merges all results in order.

like image 94
MarkokraM Avatar answered Jun 15 '26 06:06

MarkokraM