Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Multikey Multivalue Non Deterministic python dictionary

There is already a multi key dict in python and also a multivalued dict. I needed a python dictionary which is both:

example:

# probabilistically fetch any one of baloon, toy or car
d['red','blue','green']== "baloon" or "car" or "toy"  

Probability of d['red']==d['green'] is high and Probability of d['red']!=d['red'] is low but possible

the single output value should be probabilistically determined (fuzzy) based on a rule from keys eg:in above case rule could be if keys have both "red" and "blue" then return "baloon" 80% of time if only blue then return "toy" 15% of time else "car" 5% of time.

The setitem method should be designed such that following is possible:

d["red", "blue"] =[
    ("baloon",haseither('red','green'),0.8),
    ("toy",.....)
    ,....
]

Above assigns multiple values to the dictionary with a predicate function and corresponding probability. And instead of the assignment list above even a dictionary as assignment would be preferable:

d["red", "blue"] ={ 
    "baloon": haseither('red','green',0.8),
    "toy": hasonly("blue",0.15),
    "car": default(0.05)
}

In the above baloon will be returned 80% of time if "red" or green is present , return toy 15% of time if blue present and return car 5% of time without any condition.

Are there any existing data structures which already satisfy the above requirements in python? if no then how can multikeydict code be modified to meet the above requirements in python?

if using dictionary then there can be a configuration file or use of appropriate nested decorators which configures the above probabilistic predicate logics without having to hard code if \else statements .

Note: Above is a useful automata for a rule based auto responder application hence do let me know if any similar rule based framework is available in python even if it does not use the dictionary structure?

like image 894
stackit Avatar asked Apr 29 '15 08:04

stackit


1 Answers

Simulated MultiKey Dictionary

multi_key_dict did not allow __getitem__() with multiple keys at onces...

(e.g. d["red", "green"])

A multi key can be simulated with tuple or set keys. If order does not matter, set seems the best (actually the hashable frozen set, so that ["red", "blue"] is the same a ["blue", "red"].

Simulated MultiVal Dictionary

Multi values are inherent by using certain datatypes, it can be any storage element that may be conveniently indexed. A standard dict should provide that.

Non-determinism

Using a probability distribution defined by the rules and assumptions1, non-deterministic selection is performed using this recipe from the python docs.

MultiKeyMultiValNonDeterministicDict Class

What a name.   \o/-nice!

This class takes multiple keys that define a probabilistic rule set of multiple values. During item creation (__setitem__()) all value probabilities are precomputed for all combinations of keys1. During item access (__getitem__()) the precomputed probability distribution is selected and the result is evaluated based on a random weighted selection.

Definition

import random
import operator
import bisect
import itertools

# or use itertools.accumulate in python 3
def accumulate(iterable, func=operator.add):
    'Return running totals'
    # accumulate([1,2,3,4,5]) --> 1 3 6 10 15
    # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120
    it = iter(iterable)
    try:
        total = next(it)
    except StopIteration:
        return
    yield total
    for element in it:
        total = func(total, element)
        yield total

class MultiKeyMultiValNonDeterministicDict(dict):

    def key_combinations(self, keys):
        """get all combinations of keys"""
        return [frozenset(subset) for L in range(0, len(keys)+1) for subset in itertools.combinations(keys, L)]

    def multi_val_rule_prob(self, rules, rule):
        """
        assign probabilities for each value, 
        spreading undefined result probabilities
        uniformly over the leftover results not defined by rule.
        """
        all_results = set([result for result_probs in rules.values() for result in result_probs])
        prob = rules[rule]
        leftover_prob = 1.0 - sum([x for x in prob.values()])
        leftover_results = len(all_results) - len(prob)
        for result in all_results:
            if result not in prob:
                # spread undefined prob uniformly over leftover results
                prob[result] = leftover_prob/leftover_results
        return prob

    def multi_key_rule_prob(self, key, val):
        """
        assign probability distributions for every combination of keys,
        using the default for combinations not defined in rule set
        """ 
        combo_probs = {}
        for combo in self.key_combinations(key):
            if combo in val:
                result_probs = self.multi_val_rule_prob(val, combo).items()
            else:
                result_probs = self.multi_val_rule_prob(val, frozenset([])).items()
            combo_probs[combo] = result_probs
        return combo_probs

    def weighted_random_choice(self, weighted_choices):
        """make choice from weighted distribution"""
        choices, weights = zip(*weighted_choices)
        cumdist = list(accumulate(weights))
        return choices[bisect.bisect(cumdist, random.random() * cumdist[-1])]

    def __setitem__(self, key, val):
        """
        set item in dictionary, 
        assigns values to keys with precomputed probability distributions
        """

        precompute_val_probs = self.multi_key_rule_prob(key, val)        
        # use to show ALL precomputed probabilities for key's rule set
        # print precompute_val_probs        

        dict.__setitem__(self, frozenset(key), precompute_val_probs)

    def __getitem__(self, key):
        """
        get item from dictionary, 
        randomly select value based on rule probability
        """
        key = frozenset([key]) if isinstance(key, str) else frozenset(key)             
        val = None
        weighted_val = None        
        if key in self.keys():
            val = dict.__getitem__(self, key)
            weighted_val = val[key]
        else:
            for k in self.keys():
                if key.issubset(k):
                    val = dict.__getitem__(self, k)
                    weighted_val = val[key]

        # used to show probabality for key
        # print weighted_val

        if weighted_val:
            prob_results = self.weighted_random_choice(weighted_val)
        else:
            prob_results = None
        return prob_results

Usage

d = MultiKeyMultiValNonDeterministicDict()

d["red","blue","green"] = {
    # {rule_set} : {result: probability}
    frozenset(["red", "green"]): {"ballon": 0.8},
    frozenset(["blue"]): {"toy": 0.15},
    frozenset([]): {"car": 0.05}
}

Testing

Check the probabilities

N = 10000
red_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
blue_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
red_blue_green_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}
default_test = {'car':0.0, 'toy':0.0, 'ballon':0.0}

for _ in xrange(N):
    red_green_test[d["red","green"]] += 1.0
    red_blue_test[d["red","blue"]] += 1.0
    blue_test[d["blue"]] += 1.0
    default_test[d["green"]] += 1.0
    red_blue_green_test[d["red","blue","green"]] += 1.0

print 'red,green test      =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_green_test.items())
print 'red,blue test       =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_test.items())
print 'blue test           =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in blue_test.items())
print 'default test        =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in default_test.items())
print 'red,blue,green test =', ' '.join('{0}: {1:05.2f}%'.format(key, 100.0*val/N) for key, val in red_blue_green_test.items())

red,green test      = car: 09.89% toy: 10.06% ballon: 80.05%
red,blue test       = car: 05.30% toy: 47.71% ballon: 46.99%
blue test           = car: 41.69% toy: 15.02% ballon: 43.29%
default test        = car: 05.03% toy: 47.16% ballon: 47.81%
red,blue,green test = car: 04.85% toy: 49.20% ballon: 45.95%

Probabilities match rules!


Footnotes

  1. Distribution Assumption

    Since the rule set is not fully defined, assumptions are made about the probability distributions, most of this is done in multi_val_rule_prob(). Basically any undefined probability will be spread uniformly over the remaining values. This is done for all combinations of keys, and creates a generalized key interface for the random weighted selection.

    Given the example rule set

    d["red","blue","green"] = {
        # {rule_set} : {result: probability}
        frozenset(["red", "green"]): {"ballon": 0.8},
        frozenset(["blue"]): {"toy": 0.15},
        frozenset([]): {"car": 0.05}
    }
    

    this will create the following distributions

    'red'           = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'green'         = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'blue'          = [('car', 0.425), ('toy', 0.150), ('ballon', 0.425)]
    'blue,red'      = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'green,red'     = [('car', 0.098), ('toy', 0.098), ('ballon', 0.800)]
    'blue,green'    = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    'blue,green,red'= [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
     default        = [('car', 0.050), ('toy', 0.475), ('ballon', 0.475)]
    

    If this is incorrect, please advise.

like image 68
tmthydvnprt Avatar answered Oct 06 '22 15:10

tmthydvnprt