I have used dictionaries in python before but I am still new to python. This time I am using a dictionary of a dictionary of a dictionary... i.e., a three layer dict, and wanted to check before programming it.
I want to store all the data in this three-layer dict, and was wondering what'd be an nice pythonic way to initialize, and then read a file and write to such data structure.
The dictionary I want is of the following type:
{'geneid':
{'transcript_id':
{col_name1:col_value1, col_name2:col_value2}
}
}
The data is of this type:
geneid\ttx_id\tcolname1\tcolname2\n
hello\tNR432\t4.5\t6.7
bye\tNR439\t4.5\t6.7
Any ideas on how to do this in a good way?
Thanks!
First, let's start with the csv
module to handle parsing the lines:
import csv
with open('mydata.txt', 'rb') as f:
for row in csv.DictReader(f, delimiter='\t'):
print row
This will print:
{'geneid': 'hello', 'tx_id': 'NR432', 'col_name1': '4.5', 'col_name2': 6.7}
{'geneid': 'bye', 'tx_id': 'NR439', 'col_name1': '4.5', 'col_name2': 6.7}
So, now you just need to reorganize that into your preferred structure. This is almost trivial, except that you have to deal with the fact that the first time you see a given geneid
you have to create a new empty dict
for it, and likewise for the first time you see a given tx_id
within a geneid
. You can solve that with setdefault
:
import csv
genes = {}
with open('mydata.txt', 'rb') as f:
for row in csv.DictReader(f, delimiter='\t'):
gene = genes.setdefault(row['geneid'], {})
transcript = gene.setdefault(row['tx_id'], {})
transcript['colname1'] = row['colname1']
transcript['colname2'] = row['colname2']
You can make this a bit more readable with defaultdict
:
import csv
from collections import defaultdict
from functools import partial
genes = defaultdict(partial(defaultdict, dict))
with open('mydata.txt', 'rb') as f:
for row in csv.DictReader(f, delimiter='\t'):
genes[row['geneid']][row['tx_id']]['colname1'] = row['colname1']
genes[row['geneid']][row['tx_id']]['colname2'] = row['colname2']
The trick here is that the top-level dict
is a special one that returns an empty dict
whenever it first sees a new key… and that empty dict
it returns is itself an empty dict
. The only hard part is that defaultdict
takes a function that returns the right kind of object, and a function that returns a defaultdict(dict)
has to be written with a partial
, lambda
, or explicit functions. (There are recipes on ActiveState and modules on PyPI that will give you an even more general version of this that creates new dictionaries as needed all the way down, if you want.)
I was also trying to find alternatives and came up with this also great answer in stackoverflow:
What's the best way to initialize a dict of dicts in Python?
Basically in my case:
class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With