How do I convert a pandas dataFrame into a sparse dictionary of dictionaries, where only the indexes of some cutoff are shown. In the toy example below, I only want indexes for each column whose values > 0
import pandas as pd
table1 = [['gene_a', -1 , 1], ['gene_b', 1, 1],['gene_c', 0, -1]]
df1 = pd.DataFrame(table)
df1.columns = ['gene','cell_1', 'cell_2']
df1 = df1.set_index('gene')
dfasdict = df1.to_dict(orient='dict')
This gives:
dfasdict = {'cell_1': {'gene_a': -1, 'gene_b': 0, 'gene_c': 0}, 'cell_2': {'gene_a': 1, 'gene_b': -1, 'gene_c': -1}}
But the desired output is a sparse dictionary, where only values less than zero are shown:
desired = {'cell_1': {'gene_a': -1}, 'cell_2': {'gene_b': -1, 'gene_c': -1}}
I can do some processing to change the dfasdict
dictionary after creation, but I want to do the conversion in the same step since processing afterwards involves iterating over very large dictionaries. Is this possible to do all within pandas?
This result uses a dictionary comprehension to generate the result. For each column in cell_1
and cell_2
, it finds those that are less than (lt
) zero and converts the result to a dictionary.
>>> {col: df1.loc[df1[col].lt(0), col].to_dict() for col in ['cell_1', 'cell_2']}
{'cell_1': {'gene_a': -1}, 'cell_2': {'gene_c': -1}}
To help understand what is going on here:
>>> df1.loc['cell_1'].lt(0)
gene
gene_a True
gene_b False
gene_c False
Name: cell_1, dtype: bool
>>> df1.loc[df1['cell_1'].lt(0), 'cell_1'].to_dict()
{'gene_a': -1}
Delete last row of your code and add this one.
from pandas import compat
def to_dict_custom(data):
return dict((k, v[v<0].to_dict()) for k, v in compat.iteritems(data))
dfasdict = to_dict_custom(df1)
print dfasdict
which yields,
{'cell_2': {'gene_c': -1.0}, 'cell_1': {'gene_a': -1.0}}
line 3&4 inspired by here please check.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With