Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Building a table from Python nested dictionaries with missing values

Here's the current data structure I have:

{'bin1': {'A': 14545,
          'B': 18579,
          'C': 5880,
          'D': 20771,
          'E': 404396},
 'bin2': {'A': 13200,
          'B': 12279,
          'C': 5000,
          'D': 16766,
          'E': 200344},
 [...] }

And I'd like to write to a table, in the form:

        A     B     C    D     E    
bin1  14545 18579 5880 20771 494396
bin2  13200 12279 5000 16766 200344
...

Currently I'm using a crude print loop (where d == the dictionary described above):

# print the table header
labs = [i for i in d[d.keys()[0]]]
print "bin" + "\t" + "\t".join(labs)

# loop and print the values
for j in d:
    print j + "\t" + "\t".join(map(str, [d[j][q] for q in d[j]]))

Which appeared to work (despite obviously being unordered), however in my output I've found that some nested dictionaries don't contain all fields A-E. I guess one way to do this is to rebuild the dictionary, maybe using a more appropriate matrix-type data structure and fill in the missing values with 0, but given I have this huge nested dictionary to hand, is there a smarter way of printing the output shown above that could handle missing elements?

like image 646
blmoore Avatar asked Sep 11 '13 16:09

blmoore


2 Answers

Using pandas:

import pandas as pd

data = {'bin1': {'A': 14545,
          'B': 18579,
          'C': 5880,
          'D': 20771,
          'E': 404396},
        'bin2': {'A': 13200,
          'D': 16766,
          'E': 200344},
        }

df = pd.DataFrame(data).T
df.fillna(0, inplace=True)
print(df)

prints

          A      B     C      D       E
bin1  14545  18579  5880  20771  404396
bin2  13200      0     0  16766  200344

The df.fillna(0) replaces missing values with 0.

like image 61
unutbu Avatar answered Nov 12 '22 05:11

unutbu


You can use d[j].get(q, '0') instead of d[j][q] to fill in 0 for all missing entries:

# print the table header
labs = sorted(max(d.values(), key=len))
print "bin" + "\t" + "\t".join(labs)

# loop and print the values
for j in d:
    print j + "\t" + "\t".join(str(d[j].get(q, '0')) for q in labs)

I also made some slight modifications to the other parts of the code so the columns are ordered.

like image 42
Andrew Clark Avatar answered Nov 12 '22 06:11

Andrew Clark