Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Initialize/Create/Populate a Dict of a Dict of a Dict in Python

I have used dictionaries in python before but I am still new to python. This time I am using a dictionary of a dictionary of a dictionary... i.e., a three layer dict, and wanted to check before programming it.

I want to store all the data in this three-layer dict, and was wondering what'd be an nice pythonic way to initialize, and then read a file and write to such data structure.

The dictionary I want is of the following type:

{'geneid':
{'transcript_id':
{col_name1:col_value1, col_name2:col_value2}
}
}

The data is of this type:

geneid\ttx_id\tcolname1\tcolname2\n
hello\tNR432\t4.5\t6.7
bye\tNR439\t4.5\t6.7

Any ideas on how to do this in a good way?

Thanks!

like image 737
Dnaiel Avatar asked Feb 17 '23 08:02

Dnaiel


2 Answers

First, let's start with the csv module to handle parsing the lines:

import csv
with open('mydata.txt', 'rb') as f:
    for row in csv.DictReader(f, delimiter='\t'):
        print row

This will print:

{'geneid': 'hello', 'tx_id': 'NR432', 'col_name1': '4.5', 'col_name2': 6.7}
{'geneid': 'bye', 'tx_id': 'NR439', 'col_name1': '4.5', 'col_name2': 6.7}

So, now you just need to reorganize that into your preferred structure. This is almost trivial, except that you have to deal with the fact that the first time you see a given geneid you have to create a new empty dict for it, and likewise for the first time you see a given tx_id within a geneid. You can solve that with setdefault:

import csv
genes = {}
with open('mydata.txt', 'rb') as f:
    for row in csv.DictReader(f, delimiter='\t'):
        gene = genes.setdefault(row['geneid'], {})
        transcript = gene.setdefault(row['tx_id'], {})
        transcript['colname1'] = row['colname1']
        transcript['colname2'] = row['colname2']

You can make this a bit more readable with defaultdict:

import csv
from collections import defaultdict
from functools import partial
genes = defaultdict(partial(defaultdict, dict))
with open('mydata.txt', 'rb') as f:
    for row in csv.DictReader(f, delimiter='\t'):
        genes[row['geneid']][row['tx_id']]['colname1'] = row['colname1']
        genes[row['geneid']][row['tx_id']]['colname2'] = row['colname2']

The trick here is that the top-level dict is a special one that returns an empty dict whenever it first sees a new key… and that empty dict it returns is itself an empty dict. The only hard part is that defaultdict takes a function that returns the right kind of object, and a function that returns a defaultdict(dict) has to be written with a partial, lambda, or explicit functions. (There are recipes on ActiveState and modules on PyPI that will give you an even more general version of this that creates new dictionaries as needed all the way down, if you want.)

like image 64
abarnert Avatar answered Mar 25 '23 10:03

abarnert


I was also trying to find alternatives and came up with this also great answer in stackoverflow:

What's the best way to initialize a dict of dicts in Python?

Basically in my case:

class AutoVivification(dict):
    """Implementation of perl's autovivification feature."""
    def __getitem__(self, item):
        try:
            return dict.__getitem__(self, item)
        except KeyError:
            value = self[item] = type(self)()
            return value
like image 29
Dnaiel Avatar answered Mar 25 '23 09:03

Dnaiel