I would like to get the first letter with the maximum occurence of a string.
For instance:
"google" -> g
"azerty" -> a
"bbbaaa" -> b
I already have a working code, using OrdererDict() to avoid automatic keys rearangement:
from collections import OrderedDict
sentence = "google"
d = OrderedDict()
for letter in sentence:
if letter not in d.keys():
d[letter] = sentence.count(letter)
print(max(d, key=d.get)) # g
but I'm looking for a possible one-liner or more elegant solution (if it's possible).
Note: I already tried to use Counter() but it doesn't work, since dict in python don't remember the order that keys were inserted.
e.g
from collections import Counter
sentence = "bbbaaa"
c = Counter(sentence)
print(c.most_common()[0][0]) # have 50% chances of printing 'a' rather than 'b'.
Bonus question: Can someone explains why OrderedDict() are not the default dictionary behavior in python?
We find maximum occurring character by using max() on values.
The documentation for collections.OrderedDict
actually has a recipe for an OrderedCounter
:
In [5]: from collections import Counter, OrderedDict
In [6]: class OrderedCounter(Counter, OrderedDict):
...: pass
...:
In [7]: OrderedCounter("google").most_common()[0][0]
Out[7]: 'g'
Probably not very fast, but one-liner!
>>> s = "aaabbbbc"
>>> sorted(s, key=lambda c: (-s.count(c), s.index(c)))[0]
'b'
Edit
Even shorter, thanks to @Ohad Eytan's comment:
>>> min(s, key=lambda c: (-s.count(c), s.index(c)))
'b'
Benchmark
Feeling bored tonight, so I benchmarked (using timeit
) test @Joohwan's most_common_char()
solution (mostcc), @Blender's OrderedCounter
solution (odict) and my own one liner solution (onelin, using the min
variant). The fastest solution was consistently mostcc: up to ~10x faster than onelin for long strings containing few different characters, and up to ~4x faster than odict for very short strings. For short strings or strings with little repeated chars, onelin beats odict (otherwise, it's the reverse). Here are the details (Length=length of the string, #chars=number of different unicode chars to randomly pick from for each char, mostcc=time to execute 10,000 times mostcc, odict=how much longer odict was compared to mostcc, onelin=how much longer oneline was compared to mostcc).
Length #chars mostcc odict onelin
10 10: 0.08s 3.76x 1.61x
10 100: 0.10s 3.57x 1.27x
10 1000: 0.12s 3.12x 1.34x
100 10: 0.43s 1.96x 3.29x
100 100: 0.59s 2.16x 2.18x
100 1000: 0.80s 1.92x 1.72x
1000 10: 3.48s 1.56x 9.79x
1000 100: 3.44s 1.72x 6.43x
1000 1000: 6.55s 1.68x 3.30x
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With