Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Find the most common element in a list

Tags:

python

list

People also ask

How do I find the most common element in a list Python?

Use the most_common() of Counter to Find the Most Common Elements of a List in Python. In Python 2.7+, use the Counter() command to find the most common list elements in Python.

How do you find common elements in a list?

Method 1:Using Set's & property Convert the lists to sets and then print set1&set2. set1&set2 returns the common elements set, where set1 is the list1 and set2 is the list2. Below is the Python3 implementation of the above approach: Python3.

How do I find the most common number in a list?

If you just want to find most common number in a list, you can use a simple formula. Select a blank cell, here is C1, type this formula =MODE(A1:A13), and then press Enter key to get the most common number in the list.

What is the most common occurring element?

Hydrogen is the most abundant element in the Universe; helium is second. However, after this, the rank of abundance does not continue to correspond to the atomic number; oxygen has abundance rank 3, but atomic number 8.


A simpler one-liner:

def most_common(lst):
    return max(set(lst), key=lst.count)

Borrowing from here, this can be used with Python 2.7:

from collections import Counter

def Most_Common(lst):
    data = Counter(lst)
    return data.most_common(1)[0][0]

Works around 4-6 times faster than Alex's solutions, and is 50 times faster than the one-liner proposed by newacct.

To retrieve the element that occurs first in the list in case of ties:

def most_common(lst):
    data = Counter(lst)
    return max(lst, key=data.get)

With so many solutions proposed, I'm amazed nobody's proposed what I'd consider an obvious one (for non-hashable but comparable elements) -- [itertools.groupby][1]. itertools offers fast, reusable functionality, and lets you delegate some tricky logic to well-tested standard library components. Consider for example:

import itertools
import operator

def most_common(L):
  # get an iterable of (item, iterable) pairs
  SL = sorted((x, i) for i, x in enumerate(L))
  # print 'SL:', SL
  groups = itertools.groupby(SL, key=operator.itemgetter(0))
  # auxiliary function to get "quality" for an item
  def _auxfun(g):
    item, iterable = g
    count = 0
    min_index = len(L)
    for _, where in iterable:
      count += 1
      min_index = min(min_index, where)
    # print 'item %r, count %r, minind %r' % (item, count, min_index)
    return count, -min_index
  # pick the highest-count/earliest item
  return max(groups, key=_auxfun)[0]

This could be written more concisely, of course, but I'm aiming for maximal clarity. The two print statements can be uncommented to better see the machinery in action; for example, with prints uncommented:

print most_common(['goose', 'duck', 'duck', 'goose'])

emits:

SL: [('duck', 1), ('duck', 2), ('goose', 0), ('goose', 3)]
item 'duck', count 2, minind 1
item 'goose', count 2, minind 0
goose

As you see, SL is a list of pairs, each pair an item followed by the item's index in the original list (to implement the key condition that, if the "most common" items with the same highest count are > 1, the result must be the earliest-occurring one).

groupby groups by the item only (via operator.itemgetter). The auxiliary function, called once per grouping during the max computation, receives and internally unpacks a group - a tuple with two items (item, iterable) where the iterable's items are also two-item tuples, (item, original index) [[the items of SL]].

Then the auxiliary function uses a loop to determine both the count of entries in the group's iterable, and the minimum original index; it returns those as combined "quality key", with the min index sign-changed so the max operation will consider "better" those items that occurred earlier in the original list.

This code could be much simpler if it worried a little less about big-O issues in time and space, e.g....:

def most_common(L):
  groups = itertools.groupby(sorted(L))
  def _auxfun((item, iterable)):
    return len(list(iterable)), -L.index(item)
  return max(groups, key=_auxfun)[0]

same basic idea, just expressed more simply and compactly... but, alas, an extra O(N) auxiliary space (to embody the groups' iterables to lists) and O(N squared) time (to get the L.index of every item). While premature optimization is the root of all evil in programming, deliberately picking an O(N squared) approach when an O(N log N) one is available just goes too much against the grain of scalability!-)

Finally, for those who prefer "oneliners" to clarity and performance, a bonus 1-liner version with suitably mangled names:-).

from itertools import groupby as g
def most_common_oneliner(L):
  return max(g(sorted(L)), key=lambda(x, v):(len(list(v)),-L.index(x)))[0]

What you want is known in statistics as mode, and Python of course has a built-in function to do exactly that for you:

>>> from statistics import mode
>>> mode([1, 2, 2, 3, 3, 3, 3, 3, 4, 5, 6, 6, 6])
3

Note that if there is no "most common element" such as cases where the top two are tied, this will raise StatisticsError, because statistically speaking, there is no mode in this case.


Without the requirement about the lowest index, you can use collections.Counter for this:

from collections import Counter

a = [1936, 2401, 2916, 4761, 9216, 9216, 9604, 9801] 

c = Counter(a)

print(c.most_common(1)) # the one most common element... 2 would mean the 2 most common
[(9216, 2)] # a set containing the element, and it's count in 'a'

If they are not hashable, you can sort them and do a single loop over the result counting the items (identical items will be next to each other). But it might be faster to make them hashable and use a dict.

def most_common(lst):
    cur_length = 0
    max_length = 0
    cur_i = 0
    max_i = 0
    cur_item = None
    max_item = None
    for i, item in sorted(enumerate(lst), key=lambda x: x[1]):
        if cur_item is None or cur_item != item:
            if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
                max_length = cur_length
                max_i = cur_i
                max_item = cur_item
            cur_length = 1
            cur_i = i
            cur_item = item
        else:
            cur_length += 1
    if cur_length > max_length or (cur_length == max_length and cur_i < max_i):
        return cur_item
    return max_item

This is an O(n) solution.

mydict   = {}
cnt, itm = 0, ''
for item in reversed(lst):
     mydict[item] = mydict.get(item, 0) + 1
     if mydict[item] >= cnt :
         cnt, itm = mydict[item], item

print itm

(reversed is used to make sure that it returns the lowest index item)