Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Get first letter with maximum occurence of a string

I would like to get the first letter with the maximum occurence of a string.

For instance:

 "google" -> g  
 "azerty" -> a  
 "bbbaaa" -> b

I already have a working code, using OrdererDict() to avoid automatic keys rearangement:

from collections import OrderedDict

sentence = "google"

d = OrderedDict()

for letter in sentence:
    if letter not in d.keys():
        d[letter] = sentence.count(letter)

print(max(d, key=d.get)) # g

but I'm looking for a possible one-liner or more elegant solution (if it's possible).

Note: I already tried to use Counter() but it doesn't work, since dict in python don't remember the order that keys were inserted.

e.g

from collections import Counter

sentence = "bbbaaa"

c = Counter(sentence)
print(c.most_common()[0][0]) # have 50% chances of printing 'a' rather than 'b'.

Bonus question: Can someone explains why OrderedDict() are not the default dictionary behavior in python?

like image 251
Kruupös Avatar asked Aug 28 '16 16:08

Kruupös


People also ask

How do you find the maximum occurring character in a string python?

We find maximum occurring character by using max() on values.


2 Answers

The documentation for collections.OrderedDict actually has a recipe for an OrderedCounter:

In [5]: from collections import Counter, OrderedDict

In [6]: class OrderedCounter(Counter, OrderedDict):
   ...:     pass
   ...:

In [7]: OrderedCounter("google").most_common()[0][0]
Out[7]: 'g'
like image 167
Blender Avatar answered Nov 14 '22 22:11

Blender


Probably not very fast, but one-liner!

>>> s = "aaabbbbc"
>>> sorted(s, key=lambda c: (-s.count(c), s.index(c)))[0]
'b'

Edit

Even shorter, thanks to @Ohad Eytan's comment:

>>> min(s, key=lambda c: (-s.count(c), s.index(c)))
'b'

Benchmark

Feeling bored tonight, so I benchmarked (using timeit) test @Joohwan's most_common_char() solution (mostcc), @Blender's OrderedCounter solution (odict) and my own one liner solution (onelin, using the min variant). The fastest solution was consistently mostcc: up to ~10x faster than onelin for long strings containing few different characters, and up to ~4x faster than odict for very short strings. For short strings or strings with little repeated chars, onelin beats odict (otherwise, it's the reverse). Here are the details (Length=length of the string, #chars=number of different unicode chars to randomly pick from for each char, mostcc=time to execute 10,000 times mostcc, odict=how much longer odict was compared to mostcc, onelin=how much longer oneline was compared to mostcc).

Length  #chars  mostcc odict  onelin
10      10:     0.08s  3.76x  1.61x
10      100:    0.10s  3.57x  1.27x
10      1000:   0.12s  3.12x  1.34x
100     10:     0.43s  1.96x  3.29x
100     100:    0.59s  2.16x  2.18x
100     1000:   0.80s  1.92x  1.72x
1000    10:     3.48s  1.56x  9.79x
1000    100:    3.44s  1.72x  6.43x
1000    1000:   6.55s  1.68x  3.30x
like image 26
MiniQuark Avatar answered Nov 14 '22 22:11

MiniQuark