I am performing a statistical test that uses 10,000 permutations as a null distribution.
Each of the permutations is a 10,000 key dictionary. Each key is a gene, each value is a set of patients corresponding to the gene. This dictionary is programmatically generated, and can be written to and read in from a file.
I want to be able to iterate over these permutations to perform my statistical test; however, keeping this large list on the stack is slowing down my performance.
Is there a way to keep these dictionaries on stored memory and yield the permutations as I iterate over them?
Thank you!
This is a general computing problem; you want the speed of memory-stored data but don't have enough memory. You have at least these options:
Since you are iterating over your dataset, one solution could be to load data lazily:
def get_data(filename):
with open(filename) as f:
while True:
line = f.readline()
if line:
yield line
break
for item in get_data('my_genes.dat'):
gather_statistics(deserialize(item))
A variant is to split your data into multiple files or store your data in a database so you can batch process your data n items at a time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With