There is already a multi key dict in python and also a multivalued dict. I needed a python dictionary which is both:
example:
# probabilistically fetch any one of baloon, toy or car
d['red','blue','green']== "baloon" or "car" or "toy"
Probability of d['red']==d['green'] is high and Probability of d['red']!=d['red'] is low but possible
the single output value should be probabilistically determined (fuzzy) based on a rule from keys eg:in above case rule could be if keys have both "red" and "blue" then return "baloon" 80% of time if only blue then return "toy" 15% of time else "car" 5% of time.
The setitem method should be designed such that following is possible:
d["red", "blue"] =[
("baloon",haseither('red','green'),0.8),
("toy",.....)
,....
]
Above assigns multiple values to the dictionary with a predicate function and corresponding probability. And instead of the assignment list above even a dictionary as assignment would be preferable:
d["red", "blue"] ={
"baloon": haseither('red','green',0.8),
"toy": hasonly("blue",0.15),
"car": default(0.05)
}
In the above baloon will be returned 80% of time if "red" or green is present , return toy 15% of time if blue present and return car 5% of time without any condition.
Are there any existing data structures which already satisfy the above requirements in python? if no then how can multikeydict code be modified to meet the above requirements in python?
if using dictionary then there can be a configuration file or use of appropriate nested decorators which configures the above probabilistic predicate logics without having to hard code if \else statements .
Note: Above is a useful automata for a rule based auto responder application hence do let me know if any similar rule based framework is available in python even if it does not use the dictionary structure?
multi_key_dict
did not allow __getitem__()
with multiple keys at onces...
(e.g. d["red", "green"]
)
A multi key can be simulated with tuple
or set
keys. If order does not matter, set
seems the best (actually the hashable frozen set
, so that ["red", "blue"]
is the same a ["blue", "red"]
.
Multi values are inherent by using certain datatypes, it can be any storage element that may be conveniently indexed. A standard dict
should provide that.
Using a probability distribution defined by the rules and assumptions1, non-deterministic selection is performed using this recipe from the python docs.
MultiKeyMultiValNonDeterministicDict
ClassWhat a name. \o/-nice!
This class takes multiple keys that define a probabilistic rule set of multiple values. During item creation (__setitem__()
) all value probabilities are precomputed for all combinations of keys1. During item access (__getitem__()
) the precomputed probability distribution is selected and the result is evaluated based on a random weighted selection.
import random
import operator
import bisect
import itertools
# or use itertools.accumulate in python 3
def accumulate(iterable, func=operator.add):
'Return running totals'
# accumulate([1,2,3,4,5]) --> 1 3 6 10 15
# accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
it = iter(iterable)
try:
total = next(it)
except StopIteration:
return
yield total
for element in it:
total = func(total, element)
yield total
class MultiKeyMultiValNonDeterministicDict(dict):
def key_combinations(self, keys):
"""get all combinations of keys"""
return [frozenset(subset) for L in range(0, len(keys)+1) for subset in itertools.combinations(keys, L)]
def multi_val_rule_prob(self, rules, rule):
"""
assign probabilities for each value,
spreading undefined result probabilities
uniformly over the leftover results not defined by rule.
"""
all_results = set([result for result_probs in rules.values() for result in result_probs])
prob = rules[rule]
leftover_prob = 1.0 - sum([x for x in prob.values()])
leftover_results = len(all_results) - len(prob)
for result in all_results:
if result not in prob:
# spread undefined prob uniformly over leftover results
prob[result] = leftover_prob/leftover_results
return prob
def multi_key_rule_prob(self, key, val):
"""
assign probability distributions for every combination of keys,
using the default for combinations not defined in rule set
"""
combo_probs = {}
for combo in self.key_combinations(key):
if combo in val:
result_probs = self.multi_val_rule_prob(val, combo).items()
else:
result_probs = self.multi_val_rule_prob(val, frozenset([])).items()
combo_probs[combo] = result_probs
return combo_probs
def weighted_random_choice(self, weighted_choices):
"""make choice from weighted distribution"""
choices, weights = zip(*weighted_choices)
cumdist = list(accumulate(weights))
return choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]
def __setitem__(self, key, val):
"""
set item in dictionary,
assigns values to keys with precomputed probability distributions
"""
precompute_val_probs = self.multi_key_rule_prob(key, val)
# use to show ALL precomputed probabilities for key's rule set
# print precompute_val_probs
dict.__setitem__(self, frozenset(key), precompute_val_probs)
def __getitem__(self, key):
"""
get item from dictionary,
randomly select value based on rule probability
"""
key = frozenset([key]) if isinstance(key, str) else frozenset(key)
val = None
weighted_val = None
if key in self.keys():
val = dict.__getitem__(self, key)
weighted_val = val[key]
else:
for k in self.keys():
if key.issubset(k):
val = dict.__getitem__(self, k)
weighted_val = val[key]
# used to show probabality for key
# print weighted_val
if weighted_val:
prob_results = self.weighted_random_choice(weighted_val)
else:
prob_results = None
return prob_results
d = MultiKeyMultiValNonDeterministicDict()
d["red","blue","green"] = {
# {rule_set} : {result: probability}
frozenset(["red", "green"]): {"ballon": 0.8},
frozenset(["blue"]): {"toy": 0.15},
frozenset([]): {"car": 0.05}
}
Check the probabilities
N = 10000
red_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
default_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
for _ in xrange(N):
red_green_test[d["red","green"]] += 1.0
red_blue_test[d["red","blue"]] += 1.0
blue_test[d["blue"]] += 1.0
default_test[d["green"]] += 1.0
red_blue_green_test[d["red","blue","green"]] += 1.0
print 'red,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_green_test.items())
print 'red,blue test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_test.items())
print 'blue test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in blue_test.items())
print 'default test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in default_test.items())
print 'red,blue,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_green_test.items())
red,green test = car: 09.89% toy: 10.06% ballon: 80.05%
red,blue test = car: 05.30% toy: 47.71% ballon: 46.99%
blue test = car: 41.69% toy: 15.02% ballon: 43.29%
default test = car: 05.03% toy: 47.16% ballon: 47.81%
red,blue,green test = car: 04.85% toy: 49.20% ballon: 45.95%
Distribution Assumption
Since the rule set is not fully defined, assumptions are made about the probability distributions, most of this is done in multi_val_rule_prob()
. Basically any undefined probability will be spread uniformly over the remaining values. This is done for all combinations of keys, and creates a generalized key interface for the random weighted selection.
Given the example rule set
d["red","blue","green"] = {
# {rule_set} : {result: probability}
frozenset(["red", "green"]): {"ballon": 0.8},
frozenset(["blue"]): {"toy": 0.15},
frozenset([]): {"car": 0.05}
}
this will create the following distributions
'red' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue' = [('car', 0.425), ('toy', 0.150), ('ballon', 0.425)]
'blue,red' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'green,red' = [('car', 0.098), ('toy', 0.098), ('ballon', 0.800)]
'blue,green' = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
'blue,green,red'= [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
default = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
If this is incorrect, please advise.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With