Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Counting no. of matches from two lists given a condition to split the original list

I have a lists of floats with some hidden "level" information encoded in the scale of the float, and I can split the "levels" of floats as such:

import math
import numpy as np

all_scores = [1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23,
6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23,
1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24, 9603539.08653573,
17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801,
31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014,
4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]

easy, med, hard = [], [], []

for i in all_scores:
    if i > math.exp(50):
        easy.append(i)
    elif i > math.exp(10):
        med.append(i)
    else:
        hard.append(i)

print ([easy, med, hard])

[out]:

[[1.0369411057174144e+22, 2.7997409854370188e+23, 1.296176382146768e+23, 6.7401171871631936e+22, 6.7401171871631936e+22, 2.022035156148958e+24, 8.65845823274041e+23, 1.6435516525621017e+24, 2.307193960221247e+24, 1.285806971089594e+24], [9603539.08653573, 17489013.841076534, 11806185.6660164, 16057293.564414097, 8546268.728385007, 53788629.47091801, 31828243.07349571, 51740168.15200098, 53788629.47091801, 22334836.315934014], [4354.0, 7474.0, 4354.0, 4030.0, 6859.0, 8635.0, 7474.0, 8635.0, 9623.0, 8479.0]]

And I have another list that will correspond to the all_scores list:

input_scores = [0.0, 2.7997409854370188e+23, 0.0, 6.7401171871631936e+22, 0.0, 0.0, 8.6584582327404103e+23, 0.0, 2.3071939602212471e+24, 0.0, 0.0, 17489013.841076534, 11806185.6660164, 0.0, 8546268.728385007, 0.0, 31828243.073495708, 51740168.152000979, 0.0, 22334836.315934014, 4354.0, 7474.0, 4354.0, 4030.0, 0.0, 8635.0, 0.0, 0.0, 0.0, 8479.0]

I need to check how many of the easy, med and hard matches the all scores, I could do this to get the boolean of whether there's a match on the flatten all_scores list as such:

matches = [i == j for i, j in zip(input_scores, all_scores)]
print ([i == j for i, j in zip(input_scores, all_scores)])

[out]:

[False, True, False, True, False, False, True, False, True, False, False, True, True, False, True, False, True, True, False, True, True, True, True, True, False, True, False, False, False, True]

Is there a way to know how many easy/med/hard there are in the matches and the sum of the matches per level?

I have tried this and it works:

matches = [int(i == j) for i, j in zip(input_scores, all_scores)]

print(sum(matches[:len(easy)]) , len(easy), sum(np.array(easy) * matches[:len(easy)]) )
print(sum(matches[len(easy):len(easy)+len(med)]), len(med), sum(np.array(med) * matches[len(easy):len(easy)+len(med)]) )
print (sum(matches[len(easy)+len(med):]) , len(hard), sum(np.array(hard) * matches[len(easy)+len(med):]) )

[out]:

4 10 3.52041505391e+24
6 10 143744715.777
6 10 37326.0

But there must be a less verbose way to achieve the same output.

like image 565
alvas Avatar asked Mar 09 '23 13:03

alvas


2 Answers

Sounds to me like a job for... Counter!

If you haven't come across it yet, Counter is like dict, but instead of new values replacing old values in methods like .update() they just get added onto them. So:

from collections import Counter

counter = Counter({'a': 2})
counter.update({'a': 3})
counter['a']
> 5

So you get your result above with the following code:

from collections import Counter

matches, counts, scores = [
    Counter({'easy': 0, 'med': 0, 'hard': 0}) for _ in range(3)
]

for score, inp in zip(all_scores, input_scores):
    category = (
        'easy' if score > math.exp(50) else
        'med' if score > math.exp(10) else
        'hard'
    )
    matches.update({category: score == inp})
    counts.update({category: 1})
    scores.update({category: score if score == inp else 0})

for cat in ('easy', 'med', 'hard'):
    print(matches[cat], counts[cat], scores[cat])
like image 195
daphtdazz Avatar answered Apr 19 '23 23:04

daphtdazz


Here is a numpy solution using digitize to create the categories and bincount to count and sum the matches. As a free bonus these stats are also created for the left-overs.

categories = 'hard', 'med', 'easy'

# get group membership by splitting at e^10 and e^50
# the 'right' keyword tells digitize to include right boundaries
cat_map = np.digitize(all_scores, np.exp((10, 50)), right=True)
# cat_map has a zero in all the 'hard' places of all_scores
# a one in the 'med' places and a two in the 'easy' places

# add a fourth group to mark all non-matches
# we have to force at least one np.array for element-by-element
# comparison to work
cat_map[np.asanyarray(all_scores) != input_scores] = 3

# count
numbers = np.bincount(cat_map)
# count again, this time using all_scores as weights
sums = np.bincount(cat_map, all_scores)

# print
for c, n, s in zip(categories + ('unmatched',), numbers, sums):
    print('{:12}  {:2d}  {:6.4g}'.format(c, n, s))

# output:
#
# hard           6  3.733e+04
# med            6  1.437e+08
# easy           4  3.52e+24
# unmatched     14  5.159e+24
like image 45
Paul Panzer Avatar answered Apr 20 '23 01:04

Paul Panzer