Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently counting items in large python lists

I have two very large python lists that look like this:

List A: [0,0,0,0,0,0,0,1,1,1,1,2,2,3,3,3,4.........]
List B: [0,0,0,0,0,0,2,2,2,2,3,3,4,4.........]

These lists go on to very large numbers, but I specify a maximum value, say 100 and after that I can discard the rest.

Now I need to calculate for each value (0,1,2..100) the ratio: occurrences in list A / occurrences in list B. And since this value is not always possible I decided to calculate this value only if there's more than 5 occurrences of the value in each list, and if this condition is not true, then combine the occurrences of the previous value(s) and will give the same ratios for combined values if this condition is correct. For example for the above lists, I want to create a Series that looks like this:

0 : 7/6=1.166 
1 : 9/6 = 1.5
2 : 9/6 = 1.5
3 : 9/6 = 1.5
.
.
.
100 : some_number
like image 649
Triple Nipple Avatar asked Sep 06 '18 12:09

Triple Nipple


People also ask

How do you count the number of items in a list in Python?

Len() Method There is a built-in function called len() for getting the total number of items in a list, tuple, arrays, dictionary, etc. The len() method takes an argument where you may provide a list and it returns the length of the given list.

How do you count multiple items in a list Python?

If you want to count multiple items in a list, you can call count() in a loop. This approach, however, requires a separate pass over the list for every count() call; which can be catastrophic for performance. Use couter() method from class collections , instead.

How do you count items in a range in Python?

You can count the number of elements in a list in python using the len(list) function.

How do you count the number of times a string appears in a list Python?

The Python count() method can be used to count the number of times a particular item appears in a list or a string. When used with a string, the count() method counts the number of times a substring appears in a larger string.


1 Answers

You can use a Counter to count the occurences and takewhile to fill your requirement of stopping at 100.

Instead of discarding values which are not in list b, notice how I used nan.

from collections import Counter
from itertools import takewhile

def get_ratios(a, b, max_=None, min_count=0):
    if max_ is not None:
        a = takewhile(lambda x: x <= max_, a)
        b = takewhile(lambda x: x <= max_, b)

    count_a, count_b = Counter(a), Counter(b)

    return {k: float('nan') if not count_b[k] else count_a[k] / count_b[k]
            for k in set(count_a) | set(count_b)
            if count_a[k] >= min_count <= count_b[k]}

Example

a = [1, 1, 1, 2, 3, 101]
b = [1, 1, 2, 2, 4, 101]

print(get_ratios(a, b, max_=100))

Output

{ 1: 1.5,
  2: 0.5,
  3: nan,
  4: 0.0 }

To ignore some under represented values, you can set min_count to 5 as mentionned in your question.

Notice I didn't fill in empty slots with the ratio of the previous value. Unless you have a very specific use case that requires it, I recommend you do not as this would mix actual data with extrapolated data. It is better to default on the previous value when it is not found, but to not pollute the actual data.

like image 83
Olivier Melançon Avatar answered Oct 07 '22 14:10

Olivier Melançon