Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to count occurrences of unique values in Dictionary?

I have a Dictionary with doubles as values and strings as keys.

I want to count occurrences of each value in this Dictionary and I want to know this value (that is for instance repeated).

for instance:

key1, 2
key2, 2
key3, 3
key4, 2
key5, 5
key6, 5

I want to get a list:

2 - 3 (times)
3 - 1 (once)
5 - 2 (twice)

How can I do it?

like image 946
Patryk Avatar asked Dec 10 '11 21:12

Patryk


People also ask

How do you count unique items in a dictionary Python?

The simplest way to count unique values in a Python list is to convert the list to a set considering that all the elements of a set are unique. You can also count unique values in a list using a dictionary, the collections. Counter class, Numpy. unique() or Pandas.

How do you count the number of unique keys in a dictionary?

Use the len() Function to Count the Number of Keys in a Python Dictionary. The len() function in Python is used to return the total number of items present in an object. We can use the keys() method of the dictionary to get a list of all the keys in the dictionary and count the total number using len() .

How do I count unique values in pandas?

In order to get the count of unique values on multiple columns use pandas DataFrame. drop_duplicates() which drop duplicate rows from pandas DataFrame. This eliminates duplicates and return DataFrame with unique rows.


1 Answers

The first thing to note, is that you don't actually care about the keys of the dictionary. Step one therefore is to ignore them as irrelevant to the task in hand. We're going to work with the Values property of the dictionary, and the work is much the same as for any other collection of integers (or indeed any other enumerable of any other type we can compare for equality).

There are two common approaches to this problem, both of which are well worth knowing.

The first uses another dictionary to hold the count of values:

//Start with setting up the dictionary you described.
Dictionary<string, int> dict = new Dictionary<string, int>{
    {"key1", 2},
    {"key2", 2},
    {"key3", 3},
    {"key4", 2},
    {"key5", 5},
    {"key6", 5}
};
//Create a different dictionary to store the counts.
Dictionary<int, int> valCount = new Dictionary<int, int>();
//Iterate through the values, setting count to 1 or incrementing current count.
foreach(int i in dict.Values)
    if(valCount.ContainsKey(i))
        valCount[i]++;
    else
        valCount[i] = 1;
//Finally some code to output this and prove it worked:
foreach(KeyValuePair<int, int> kvp in valCount)//note - not sorted, that must be added if needed
    Console.WriteLine("{0} - {1}", kvp.Key, kvp.Value);

Hopefully this is pretty straightforward. Another approach is more complicated but has some pluses:

//Start with setting up the dictionary you described.
Dictionary<string, int> dict = new Dictionary<string, int>{
    {"key1", 2},
    {"key2", 2},
    {"key3", 3},
    {"key4", 2},
    {"key5", 5},
    {"key6", 5}
};
IEnumerable<IGrouping<int, int>> grp = dict.Values.GroupBy(x => x);
//Two options now. One is to use the results directly such as with the
//equivalent code to output this and prove it worked:
foreach(IGrouping<int, int> item in grp)//note - not sorted, that must be added if needed
    Console.WriteLine("{0} - {1}", item.Key, item.Count());
//Alternatively, we can put these results into another collection for later use:
Dictionary<int, int> valCount = grp.ToDictionary(g => g.Key, g => g.Count());
//Finally some code to output this and prove it worked:
foreach(KeyValuePair<int, int> kvp in valCount)//note - not sorted, that must be added if needed
    Console.WriteLine("{0} - {1}", kvp.Key, kvp.Value);

(We'd probably use var rather than the verbose IEnumerable<IGrouping<int, int>>, but it's worth being precise when explaining code).

In a straight comparison, this version is inferior - both more complicated to understand and less efficient. However, learning this approach allows for some concise and efficient variants of the same technique, so it's worth examining.

GroupBy() takes an enumeration and creates another enumeration that contains key-value pairs where the value is an enumeration too. The lambda x => x means that what it is grouped by is itself, but we've the flexibilty for different grouping rules than that. The contents of grp looks a bit like:

{
  {Key=2, {2, 2, 2}}
  {Key=3, {3}}
  {Key=5, {5, 5}}
}

So, if we loop through this an for each group we pull out the Key and call Count() on the group, we get the results we want.

Now, in the first case we built up our count in a single O(n) pass, while here we build up the group in a O(n) pass, and then obtain the count in a second O(n) pass, making it much less efficient. It's also a bit harder to understand, so why bother mentioning it?

Well, the first is that once we do understand it we can turn the lines:

IEnumerable<IGrouping<int, int>> grp = dict.Values.GroupBy(x => x);
foreach(IGrouping<int, int> item in grp)
    Console.WriteLine("{0} - {1}", item.Key, item.Count());

Into:

foreach(var item in dict.Values.GroupBy(x => x))
  Console.WriteLine("{0} - {1}", item.Key, item.Count());

Which is quite concise, and becomes idiomatic. It's especially nice if we want to then go on and do something more complicated with the value-count pairs as we can chain this into another operation.

The version that puts the results into a dictionary can be even more concise still:

var valCount = dict.Values.GroupBy(x => x).ToDictionary(g => g.Key, g => g.Count());

There, your whole question answered in one short line, rather than the 6 (cutting out comments) for the first version.

(Some might prefer to replace dict.Values.GroupBy(x => x) with dict.GroupBy(x => x.Value) which will have exactly the same results once we run the Count() on it. If you aren't immediately sure why, try to work it out).

The other advantage, is that we have more flexibility with GroupBy in other cases. For these reasons, people who are used to using GroupBy are quite likely to start off with the one-line concision of dict.Values.GroupBy(x => x).ToDictinary(g => g.Key, g => g.Count()); and then change to the more verbose but more effient form of the first version (where we increment running totals in the new dictionary) if it proved a performance hotspot.

like image 105
Jon Hanna Avatar answered Oct 26 '22 02:10

Jon Hanna