Best Way to handle Large List of Dictionaries in Python

Question

I am performing a statistical test that uses 10,000 permutations as a null distribution.

Each of the permutations is a 10,000 key dictionary. Each key is a gene, each value is a set of patients corresponding to the gene. This dictionary is programmatically generated, and can be written to and read in from a file.

I want to be able to iterate over these permutations to perform my statistical test; however, keeping this large list on the stack is slowing down my performance.

Is there a way to keep these dictionaries on stored memory and yield the permutations as I iterate over them?

Thank you!

Erik Cederstrand · Accepted Answer

This is a general computing problem; you want the speed of memory-stored data but don't have enough memory. You have at least these options:

Buy more RAM (obviously)
Let the process swap. This leaves it to the OS to decide which data to store on disk and which to store in memory
Don't load everything into memory at once

Since you are iterating over your dataset, one solution could be to load data lazily:

def get_data(filename):
    with open(filename) as f:
        while True:
            line = f.readline()
            if line:
                yield line
            break

for item in get_data('my_genes.dat'):
    gather_statistics(deserialize(item))

A variant is to split your data into multiple files or store your data in a database so you can batch process your data n items at a time.

Best Way to handle Large List of Dictionaries in Python

Tags:

python

dictionary

Jonathan Lu

1 Answers

Erik Cederstrand

Recent Activity

Donate For Us

Best Way to handle Large List of Dictionaries in Python

Tags:

python

dictionary

Jonathan Lu

1 Answers

Erik Cederstrand

Related questions

Recent Activity

Donate For Us