Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Overcoming MemoryError / Slow Runtime in Ashton String task

In the Ashton String task, the goal is to:

Arrange all the distinct substrings of a given string in lexicographical order and concatenate them. Print the Kth character of the concatenated string. It is assured that given value of K will be valid i.e. there will be a Kth character.

The Input Format:

First line will contain a number T i.e. number of test cases. First line of each test case will contain a string containing characters (a−z) and second line will contain a number K.

The Output Format:

Print Kth character ( the string is 1 indexed )

And the Constraints are

1 ≤ T ≤ 5
1 ≤ length ≤ 105
K will be an appropriate integer.

For example, given the input:

1
dbac
3

The output would be: c

I've attempted the task with this code and it works for relatively short strings:

from itertools import chain

def zipngram(text, n=2):
    words = list(text)
    return zip(*[words[i:] for i in range(n)])

for _ in input():
    t = input()
    position = int(input())-1 # 0th indexing
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
    concatstr = ''.join(sorted([''.join(s) for s in chargrams]))
    print (concatstr[position])

But if the input file looks like this: http://pastebin.com/raw/WEi2p09H and the desired output is:

l
s
y
h
s

The interpreter will throw a MemoryError:

Traceback (most recent call last):
  File "solution.py", line 11, in <module>
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
  File "solution.py", line 11, in <listcomp>
    chargrams = chain(*[zipngram(t,i) for i in range(1,len(t)+1)])
  File "solution.py", line 6, in zipngram
    return zip(*[words[i:] for i in range(n)])
  File "solution.py", line 6, in <listcomp>
    return zip(*[words[i:] for i in range(n)])
MemoryError

How can the MemoryError be resolved? Is it solvable in another way using native python2 or python3?

I tried to resolve the MemoryError by pruning the stack using heapq but now it goes into an uber slow runtime pushing and popping the heap instead of taking up too much memory.

from itertools import chain
import heapq

t = int(input())
s = list(input())
k = int(input())

stack = []
for n in range(1,len(s)+1):
    for j in range(n):
        ngram = (''.join(s[j:]))
        ngram_len = len(ngram)
        # Push the ngram into the heapq.
        heapq.heappush(stack, ngram)
        pruned_stack = []
        temp_k = 0
        # Prune stack.
        while stack != [] and temp_k < k:
            x = heapq.heappop(stack)
            pruned_stack.append(x)
            temp_k+=len(x)
        # Replace stack with pruend_stack.
        stack = pruned_stack

print (''.join(chain(*pruned_stack))[k])

Is there a way to balance between not using too much memory that it leads to MemoryError and too slow runtime with heapq pushing and popping?

like image 955
alvas Avatar asked Dec 30 '15 14:12

alvas


1 Answers

MemoryError means that the program consumed all available memory and thus crashed.

A possible solution is using iterables (they work in Py2 too but Py3 has better support with them) that are lazy (they compute the value only on demand, not all at once).

Adapting your program to generators should need only minor changes, to index a generator without using a list (that would nullify the lazy benefit) see: Get the nth item of a generator in Python

like image 93
Caridorc Avatar answered Sep 28 '22 09:09

Caridorc