I have this list with 5 sequence of numbers:
['123', '134', '234', '214', '223']
and I want to obtain the percentage of each number 1, 2, 3, 4
in the ith
position of each sequence of numbers. For example, the numbers at 0th
position of this 5
sequences of numbers are 1 1 2 2 2
, then I need to calculate the percentage of
1, 2, 3, 4
in this sequence of numbers and return the percentage as 0th
element of a new list.
['123', '134', '234', '214', '223']
0th position: 1 1 2 2 2 the percentage of 1,2,3,4 are respectively: [0.4, 0.6, 0.0, 0.0]
1th position: 2 3 3 1 2 the percentage of 1,2,3,4 are respectively: [0.2, 0.4, 0.4, 0.0]
2th position: 3 4 4 4 3 the percentage of 1,2,3,4 are respectively: [0.0, 0.0, 0.4, 0.6]]
Then desired result is to return:
[[0.4, 0.6, 0.0, 0.0], [0.2, 0.4, 0.4, 0.0], [0.0, 0.0, 0.4, 0.6]]
My attempt so far:
list(zip(*['123', '134', '234', '214', '223']))
Result:
[('1', '1', '2', '2', '2'), ('2', '3', '3', '1', '2'), ('3', '4', '4', '4', '3')]
But I got stuck here, then I don't know how to calculate the percentage of the element of each numbers of 1, 2, 3, 4
, then obtain the desired result. Any suggestion is appreciated!
starting from your approach, you could do the rest with a Counter
from collections import Counter
for item in zip(*['123', '134', '234', '214', '223']):
c = Counter(item)
total = sum(c.values())
percent = {key: value/total for key, value in c.items()}
print(percent)
# convert to list
percent_list = [percent.get(str(i), 0.0) for i in range(5)]
print(percent_list)
which prints
{'2': 0.6, '1': 0.4}
[0.0, 0.4, 0.6, 0.0, 0.0]
{'2': 0.4, '3': 0.4, '1': 0.2}
[0.0, 0.2, 0.4, 0.4, 0.0]
{'4': 0.6, '3': 0.4}
[0.0, 0.0, 0.0, 0.4, 0.6]
You could start by creating the zipped list as you did:
zipped = zip(*l)
then map an itertools.Counter
to it as to get the counts of each item in the results from zip
:
counts = map(Counter, zipped)
and then go through it, creating a list out of their counts divided by their sizes:
res = [[c[i]/sum(c.values()) for i in '1234'] for c in counts]
print(res)
[[0.4, 0.6, 0.0, 0.0], [0.2, 0.4, 0.4, 0.0], [0.0, 0.0, 0.4, 0.6]]
If you are a one-liner kind of person, mush the first two in the comprehension to get this in one line:
res = [[c[i]/sum(c.values()) for i in '1234'] for c in map(Counter, zip(*l))]
additionally, as noted in a comment, if you don't know the elements ahead of time, sorted(set(''.join(l)))
could replace '1234'
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With