I have two very large python lists that look like this:
List A: [0,0,0,0,0,0,0,1,1,1,1,2,2,3,3,3,4.........]
List B: [0,0,0,0,0,0,2,2,2,2,3,3,4,4.........]
These lists go on to very large numbers, but I specify a maximum value, say 100 and after that I can discard the rest.
Now I need to calculate for each value (0,1,2..100) the ratio: occurrences in list A / occurrences in list B. And since this value is not always possible I decided to calculate this value only if there's more than 5 occurrences of the value in each list, and if this condition is not true, then combine the occurrences of the previous value(s) and will give the same ratios for combined values if this condition is correct. For example for the above lists, I want to create a Series that looks like this:
0 : 7/6=1.166
1 : 9/6 = 1.5
2 : 9/6 = 1.5
3 : 9/6 = 1.5
.
.
.
100 : some_number
Len() Method There is a built-in function called len() for getting the total number of items in a list, tuple, arrays, dictionary, etc. The len() method takes an argument where you may provide a list and it returns the length of the given list.
If you want to count multiple items in a list, you can call count() in a loop. This approach, however, requires a separate pass over the list for every count() call; which can be catastrophic for performance. Use couter() method from class collections , instead.
You can count the number of elements in a list in python using the len(list) function.
The Python count() method can be used to count the number of times a particular item appears in a list or a string. When used with a string, the count() method counts the number of times a substring appears in a larger string.
You can use a Counter
to count the occurences and takewhile
to fill your requirement of stopping at 100
.
Instead of discarding values which are not in list b
, notice how I used nan
.
from collections import Counter
from itertools import takewhile
def get_ratios(a, b, max_=None, min_count=0):
if max_ is not None:
a = takewhile(lambda x: x <= max_, a)
b = takewhile(lambda x: x <= max_, b)
count_a, count_b = Counter(a), Counter(b)
return {k: float('nan') if not count_b[k] else count_a[k] / count_b[k]
for k in set(count_a) | set(count_b)
if count_a[k] >= min_count <= count_b[k]}
a = [1, 1, 1, 2, 3, 101]
b = [1, 1, 2, 2, 4, 101]
print(get_ratios(a, b, max_=100))
{ 1: 1.5,
2: 0.5,
3: nan,
4: 0.0 }
To ignore some under represented values, you can set min_count
to 5
as mentionned in your question.
Notice I didn't fill in empty slots with the ratio of the previous value. Unless you have a very specific use case that requires it, I recommend you do not as this would mix actual data with extrapolated data. It is better to default on the previous value when it is not found, but to not pollute the actual data.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With