Say I have a very large list of integers that occupies a very large amount of memory. If the list's integers were in even increments, I could then easily express the list as an iterator occupying comparatively no memory. But with more complicated patterns, it would become more difficult to express this list as an iterator.
Is there a Python package that can analyze a list of integers and return an "optimized" iterator? Or methodologies I can look into to accomplish this?
My proof of concept, using lzma library (backport for python 2) with compression to memory. Instead of memory buffer you can use file on disk:
import io
import random
import struct
import sys
from backports import lzma
# Create array of integers with some duplicates
data = []
for i in xrange(0, 2000):
data += [random.randint(-sys.maxint, sys.maxint)] * random.randint(0, 500)
print('Uncompressed: {}'.format(len(data)))
buff = io.BytesIO()
fmt = 'i' # check https://docs.python.org/3/library/struct.html#format-characters
lzma_writer = lzma.LZMAFile(buff, 'wb')
for i in data:
lzma_writer.write(struct.pack(fmt, i))
lzma_writer.close()
print('Compressed: {}'.format(len(buff.getvalue())))
buff.seek(0)
lzma_reader = lzma.LZMAFile(buff, 'rb')
size_of = struct.calcsize(fmt)
def generate():
r = lzma_reader.read(size_of)
while len(r) != 0:
yield struct.unpack(fmt, r)[0]
r = lzma_reader.read(size_of)
# Test if it is same array
res = list(generate())
print res == data
Result:
Uncompressed: 496225
Compressed: 11568
True
I agree with Efron Licht, clearly: It entirely depends on complexity of particular list to compact (not to say 'compress'). Unless your lists are simple enought to express as generators, your only choice is to use Bartek Jablonski answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With