Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Pandas dataframe into sparse dictionary of dictionaries

How do I convert a pandas dataFrame into a sparse dictionary of dictionaries, where only the indexes of some cutoff are shown. In the toy example below, I only want indexes for each column whose values > 0

import pandas as pd

table1 = [['gene_a', -1 , 1], ['gene_b', 1, 1],['gene_c', 0, -1]]
df1 = pd.DataFrame(table)
df1.columns = ['gene','cell_1', 'cell_2']
df1 = df1.set_index('gene')
dfasdict = df1.to_dict(orient='dict')

This gives:

dfasdict = {'cell_1': {'gene_a': -1, 'gene_b': 0, 'gene_c': 0}, 'cell_2': {'gene_a': 1, 'gene_b': -1, 'gene_c': -1}}

But the desired output is a sparse dictionary, where only values less than zero are shown:

desired = {'cell_1': {'gene_a': -1}, 'cell_2': {'gene_b': -1, 'gene_c': -1}}

I can do some processing to change the dfasdict dictionary after creation, but I want to do the conversion in the same step since processing afterwards involves iterating over very large dictionaries. Is this possible to do all within pandas?

like image 905
Thomas Matthew Avatar asked Apr 12 '16 23:04

Thomas Matthew


2 Answers

This result uses a dictionary comprehension to generate the result. For each column in cell_1 and cell_2, it finds those that are less than (lt) zero and converts the result to a dictionary.

>>> {col: df1.loc[df1[col].lt(0), col].to_dict() for col in ['cell_1', 'cell_2']}
{'cell_1': {'gene_a': -1}, 'cell_2': {'gene_c': -1}}

To help understand what is going on here:

>>> df1.loc['cell_1'].lt(0)
gene
gene_a     True
gene_b    False
gene_c    False
Name: cell_1, dtype: bool

>>> df1.loc[df1['cell_1'].lt(0), 'cell_1'].to_dict()
{'gene_a': -1}
like image 65
Alexander Avatar answered Nov 01 '22 06:11

Alexander


Delete last row of your code and add this one.

from pandas import compat

def to_dict_custom(data):
    return dict((k, v[v<0].to_dict()) for k, v in compat.iteritems(data))

dfasdict = to_dict_custom(df1)
print dfasdict

which yields,

{'cell_2': {'gene_c': -1.0}, 'cell_1': {'gene_a': -1.0}}

line 3&4 inspired by here please check.

like image 30
su79eu7k Avatar answered Nov 01 '22 08:11

su79eu7k