Given a dataframe like below
cat dog hamster dolphin
cat 1 0.5 0 0.25
dog 0.5 1 0 0
hamster 0 0 1 0.5
dolphin 0.25 0 0.5 1
I want to get the column values which are bigger than zero for the given row in dictionary format. For example, for hamster line, the result should be:
{ 'hamster': 1, 'dolphin': 0.5 }
It would be even better omitting the column with the same name though, so for 'hamster', this would be better:
{ 'dolphin': 0.5 }
At the moment I receive all values of the given row using df["hamster"].to_dict()
and removing zero values with dictionary comprehension, like {k: v for (k,v) in d.items() if v > 0 }
, but it's far from ideal, as in the original size of dataframe is about 50000 x 50000. Is there any simpler method in pandas to filter out the columns with value 0 (and the column with the same name, if it's easy to do)?
You can apply to_dict to create dictionary as a value for each row and get series as output,
df.apply(lambda x: x[(x!=0) & (x.keys()!=x.name)].to_dict())
cat {'dog': 0.5, 'dolphin': 0.25}
dog {'cat': 0.5}
hamster {'dolphin': 0.5}
dolphin {'cat': 0.25, 'hamster': 0.5}
Or you can convert the above series to dictionary with index as keys,
df.apply(lambda x: x[(x!=0) & (x.keys()!=x.name)].to_dict()).to_dict()
You get,
{'cat': {'dog': 0.5, 'dolphin': 0.25},
'dog': {'cat': 0.5},
'hamster': {'dolphin': 0.5},
'dolphin': {'cat': 0.25, 'hamster': 0.5}}
If you get following with pandas 1.1.2
{0: {'dog': 0.5, 'dolphin': 0.25},
1: {'cat': 0.5},
2: {'dolphin': 0.5},
3: {'cat': 0.25, 'hamster': 0.5}}
you can explicitly specify orient parameter
df.to_dict('index')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With