Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Summarize and plot list of ndarrays according to chosen values

I have a list of ndarrays:

list1 = [t1, t2, t3, t4, t5]

Each t consists of:

t1 = np.array([[10,0.1],[30,0.05],[30,0.1],[20,0.1],[10,0.05],[10,0.05],[0,0.5],[20,0.05],[10,0.0]], np.float64)

t2 = np.array([[0,0.05],[0,0.05],[30,0],[10,0.25],[10,0.2],[10,0.25],[20,0.1],[20,0.05],[10,0.05]], np.float64)

...

Now I want for the whole list to get for each t the average of the values corresponding to the first element:

t1out = [[0,0.5],[10,(0.1+0.05+0.05+0)/4],[20,(0.1+0.05)/2],[30,0.075]]

t2out = [[0,0.05],[10,0.1875],[20,0.075],[30,0]]

....

After generating the t_1 ... t_n, I want to plot the probabilities over the classes for each t, where the first elements represent the classes (0,10,20,30) and the second elements show the probabilities of which these classes occurr (0.1,0.7,0.15,0). Something like a histogram or a probability distribution in form of a bar plot like:

plt.bar([classes],[probabilities])

plt.bar([item[0] for item in t1out],[item[1] for item in t1out])
like image 935
Zed Avatar asked Nov 25 '25 21:11

Zed


1 Answers

This is how you can calculate that with NumPy:

import numpy as np

def mean_by_class(t, classes=None):
    # Classes should be passed if you want to ensure
    # that all classes are in the output even if they
    # are not in the current t vector
    if classes is None:
        classes = np.unique(t[:, 0])
    bins = np.r_[classes, classes[-1] + 1]
    h, _ = np.histogram(t[:, 0], bins)
    d = np.digitize(t[:, 0], bins, right=True)
    out = np.zeros(len(classes), t.dtype)
    np.add.at(out, d, t[:, 1])
    out /= h.clip(min=1)
    return np.c_[classes, out]

t1 = np.array([[10, 0.1 ], [30, 0.05], [30, 0.1 ],
               [20, 0.1 ], [10, 0.05], [10, 0.05],
               [ 0, 0.5 ], [20, 0.05], [10, 0.0 ]],
              dtype=np.float64)
print(mean_by_class(t1))
# [[ 0.     0.5  ]
#  [10.     0.05 ]
#  [20.     0.075]
#  [30.     0.075]]

As a side note, it may not be the best choice to store class values, which are integers, in a float array. You could consider using a structured array instead, for example like this:

import numpy as np

def mean_by_class(t, classes=None):
    if classes is None:
        classes = np.unique(t['class'])
    bins = np.r_[classes, classes[-1] + 1]
    h, _ = np.histogram(t['class'], bins)
    d = np.digitize(t['class'], bins, right=True)
    out = np.zeros(len(classes), t.dtype)
    out['class'] = classes
    np.add.at(out['p'], d, t['p'])
    out['p'] /= h.clip(min=1)
    return out

t1 = np.array([(10, 0.1 ), (30, 0.05), (30, 0.1 ),
               (20, 0.1 ), (10, 0.05), (10, 0.05),
               ( 0, 0.5 ), (20, 0.05), (10, 0.0 )],
              dtype=[('class', np.int32), ('p', np.float64)])
print(mean_by_class(t1))
# [( 0, 0.5  ) (10, 0.05 ) (20, 0.075) (30, 0.075)]
like image 134
jdehesa Avatar answered Nov 28 '25 12:11

jdehesa



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!