<p>I have some data stored in a list that I would like to group based on a value.</p> <p>For example, if my data is</p> <pre class="prettyprint"><code>data = [(1, 'a'), (2, 'x'), (1, 'b')] </code></pre> <p>and I want to group it by the first value in each tuple to get</p> <pre class="prettyprint"><code>result = [(1, 'ab'), (2, 'x')] </code></pre> <p>how would I go about it?</p> <p>More generally, what's the recommended way to group data in python? Is there a recipe that can help me?</p>

<p>The go-to data structure to use for all kinds of grouping is the dict. The idea is to use something that uniquely identifies a group as the dict's keys, and store all values that belong to the same group under the same key.</p> <p>As an example, your data could be stored in a dict like this:</p> <pre class="prettyprint"><code>{1: ['a', 'b'], 2: ['x']} </code></pre> <p>The integer that you're using to group the values is used as the dict key, and the values are aggregated in a list.</p> <p>The reason why we're using a dict is because it can map keys to values in constant O(1) time. This makes the grouping process very efficient and also very easy. The general structure of the code will always be the same for all kinds of grouping tasks: You iterate over your data and gradually fill a dict with grouped values. Using a <code>defaultdict</code> instead of a regular dict makes the whole process even easier, because we don't have to worry about initializing the dict with empty lists.</p> <pre class="prettyprint"><code>import collections groupdict = collections.defaultdict(list) for value in data: group = value[0] value = value[1] groupdict[group].append(value) # result: # {1: ['a', 'b'], # 2: ['x']} </code></pre> <p>Once the data is grouped, all that's left is to convert the dict to your desired output format:</p> <pre class="prettyprint"><code>result = [(key, ''.join(values)) for key, values in groupdict.items()] # result: [(1, 'ab'), (2, 'x')] </code></pre> <hr> <h3><strong>The Grouping Recipe</strong></h3> <p>The following section will provide recipes for different kinds of inputs and outputs, and show how to group by various things. The basis for everything is the following snippet:</p> <pre class="prettyprint"><code>import collections groupdict = collections.defaultdict(list) for value in data: # input group = ??? # group identifier value = ??? # value to add to the group groupdict[group].append(value) result = groupdict # output </code></pre> <p>Each of the commented lines can/has to be customized depending on your use case.</p> <h3>Input</h3> <p>The format of your input data dictates how you iterate over it.</p> <p>In this section, we're customizing the <code>for value in data:</code> line of the recipe.</p> <ul> <li> <h3>A list of values</h3> <p>More often than not, all the values are stored in a flat list:</p> <pre class="prettyprint"><code>data = [value1, value2, value3, ...] </code></pre> <p>In this case we simply iterate over the list with a <code>for</code> loop:</p> <pre class="prettyprint"><code>for value in data: </code></pre> </li> <li> <h3>Multiple lists</h3> <p>If you have multiple lists with each list holding the value of a different attribute like</p> <pre class="prettyprint"><code>firstnames = [firstname1, firstname2, ...] middlenames = [middlename1, middlename2, ...] lastnames = [lastname1, lastname2, ...] </code></pre> <p>use the <code>zip</code> function to iterate over all lists simultaneously:</p> <pre class="prettyprint"><code>for value in zip(firstnames, middlenames, lastnames): </code></pre> <p>This will make <code>value</code> a tuple of <code>(firstname, middlename, lastname)</code>.</p> </li> <li> <h3>Multiple dicts or a list of dicts</h3> <p>If you want to combine multiple dicts like</p> <pre class="prettyprint"><code>dict1 = {'a': 1, 'b': 2} dict2 = {'b': 5} </code></pre> <p>First put them all in a list:</p> <pre class="prettyprint"><code>dicts = [dict1, dict2] </code></pre> <p>And then use two nested loops to iterate over all <code>(key, value)</code> pairs:</p> <pre class="prettyprint"><code>for dict_ in dicts: for value in dict_.items(): </code></pre> <p>In this case, the <code>value</code> variable will take the form of a 2-element tuple like <code>('a', 1)</code> or <code>('b', 2)</code>.</p> </li> </ul> <h3>Grouping</h3> <p>Here we'll cover various ways to extract group identifiers from your data.</p> <p>In this section, we're customizing the <code>group = ???</code> line of the recipe.</p> <ul> <li> <h3>Grouping by a list/tuple/dict element</h3> <p>If your values are lists or tuples like <code>(attr1, attr2, attr3, ...)</code> and you want to group them by the nth element:</p> <pre class="prettyprint"><code>group = value[n] </code></pre> <p>The syntax is the same for dicts, so if you have values like <code>{'firstname': 'foo', 'lastname': 'bar'}</code> and you want to group by the first name:</p> <pre class="prettyprint"><code>group = value['firstname'] </code></pre> </li> <li> <h3>Grouping by an attribute</h3> <p>If your values are objects like <code>datetime.date(2018, 5, 27)</code> and you want to group them by an attribute, like <code>year</code>:</p> <pre class="prettyprint"><code>group = value.year </code></pre> </li> <li> <h3>Grouping by a key function</h3> <p>Sometimes you have a function that returns a value's group when it's called. For example, you could use the <code>len</code> function to group values by their length:</p> <pre class="prettyprint"><code>group = len(value) </code></pre> </li> <li> <h3>Grouping by multiple values</h3> <p>If you wish to group your data by more than a single value, you can use a tuple as the group identifier. For example, to group strings by their first letter <em>and</em> their length:</p> <pre class="prettyprint"><code>group = (value[0], len(value)) </code></pre> </li> <li> <h3>Grouping by something unhashable</h3> <p>Because dict keys must be hashable, you will run into problems if you try to group by something that can't be hashed. In such a case, you have to find a way to convert the unhashable value to a hashable representation.</p> <ol> <li> <p><strong>sets</strong>: Convert sets to frozensets, which are hashable:</p> <pre class="prettyprint"><code>group = frozenset(group) </code></pre> </li> <li> <p><strong>dicts</strong>: Dicts can be represented as sorted <code>(key, value)</code> tuples:</p> <pre class="prettyprint"><code>group = tuple(sorted(group.items())) </code></pre> </li> </ol> </li> </ul> <h3>Modifying the aggregated values</h3> <p>Sometimes you will want to modify the values you're grouping. For example, if you're grouping tuples like <code>(1, 'a')</code> and <code>(1, 'b')</code> by the first element, you might want to remove the first element from each tuple to get a result like <code>{1: ['a', 'b']}</code> rather than <code>{1: [(1, 'a'), (1, 'b')]}</code>.</p> <p>In this section, we're customizing the <code>value = ???</code> line of the recipe.</p> <ul> <li> <h3>No change</h3> <p>If you don't want to change the value in any way, simple delete the <code>value = ???</code> line from your code.</p> </li> <li> <h3>Keeping only a single list/tuple/dict element</h3> <p>If your values are lists like <code>[1, 'a']</code> and you only want to keep the <code>'a'</code>:</p> <pre class="prettyprint"><code>value = value[1] </code></pre> <p>Or if they're dicts like <code>{'firstname': 'foo', 'lastname': 'bar'}</code> and you only want to keep the first name: </p> <pre class="prettyprint"><code>value = value['firstname'] </code></pre> </li> <li> <h3>Removing the first list/tuple element</h3> <p>If your values are lists like <code>[1, 'a', 'foo']</code> and <code>[1, 'b', 'bar']</code> and you want to discard the first element of each tuple to get a group like <code>[['a', 'foo], ['b', 'bar']]</code>, use the slicing syntax:</p> <pre class="prettyprint"><code>value = value[1:] </code></pre> </li> <li> <h3>Removing/Keeping arbitrary list/tuple/dict elements</h3> <p>If your values are lists like <code>['foo', 'bar', 'baz']</code> or dicts like <code>{'firstname': 'foo', 'middlename': 'bar', 'lastname': 'baz'}</code> and you want delete or keep only some of these elements, start by creating a set of elements you want to keep or delete. For example:</p> <pre class="prettyprint"><code>indices_to_keep = {0, 2} keys_to_delete = {'firstname', 'middlename'} </code></pre> <p>Then choose the appropriate snippet from this list:</p> <ol> <li> <strong>To keep list elements:</strong> <code>value = [val for i, val in enumerate(value) if i in indices_to_keep]</code> </li> <li> <strong>To delete list elements:</strong> <code>value = [val for i, val in enumerate(value) if i not in indices_to_delete]</code> </li> <li> <strong>To keep dict elements:</strong> <code>value = {key: val for key, val in value.items() if key in keys_to_keep]</code> </li> <li> <strong>To delete dict elements:</strong> <code>value = {key: val for key, val in value.items() if key not in keys_to_delete]</code> </li> </ol> </li> </ul> <h3>Output</h3> <p>Once the grouping is complete, we have a <code>defaultdict</code> filled with lists. But the desired result isn't always a (default)dict.</p> <p>In this section, we're customizing the <code>result = groupdict</code> line of the recipe.</p> <ul> <li> <h3>A regular dict</h3> <p>To convert the defaultdict to a regular dict, simply call the <code>dict</code> constructor on it:</p> <pre class="prettyprint"><code>result = dict(groupdict) </code></pre> </li> <li> <h3>A list of <code>(group, value)</code> pairs</h3> <p>To get a result like <code>[(group1, value1), (group1, value2), (group2, value3)]</code> from the dict <code>{group1: [value1, value2], group2: [value3]}</code>, use a list comprehension:</p> <pre class="prettyprint"><code>result = [(group, value) for group, values in groupdict.items() for value in values] </code></pre> </li> <li> <h3>A nested list of just values</h3> <p>To get a result like <code>[[value1, value2], [value3]]</code> from the dict <code>{group1: [value1, value2], group2: [value3]}</code>, use <code>dict.values</code>:</p> <pre class="prettyprint"><code>result = list(groupdict.values()) </code></pre> </li> <li> <h3>A flat list of just values</h3> <p>To get a result like <code>[value1, value2, value3]</code> from the dict <code>{group1: [value1, value2], group2: [value3]}</code>, flatten the dict with a list comprehension:</p> <pre class="prettyprint"><code>result = [value for values in groupdict.values() for value in values] </code></pre> </li> <li> <h3>Flattening iterable values</h3> <p>If your values are lists or other iterables like</p> <pre class="prettyprint"><code>groupdict = {group1: [[list1_value1, list1_value2], [list2_value1]]} </code></pre> <p>and you want a flattened result like</p> <pre class="prettyprint"><code>result = {group1: [list1_value1, list1_value2, list2_value1]} </code></pre> <p>you have two options:</p> <ol> <li> <p>Flatten the lists with a dict comprehension:</p> <pre class="prettyprint"><code>result = {group: [x for iterable in values for x in iterable] for group, values in groupdict.items()} </code></pre> </li> <li> <p>Avoid creating a list of iterables in the first place, by using <code>list.extend</code> instead of <code>list.append</code>. In other words, change</p> <pre class="prettyprint"><code>groupdict[group].append(value) </code></pre> <p>to</p> <pre class="prettyprint"><code>groupdict[group].extend(value) </code></pre> <p>And then just set <code>result = groupdict</code>.</p> </li> </ol> </li> <li> <h3>A sorted list</h3> <p>Dicts are unordered data structures. If you iterate over a dict, you never know in which order its elements will be listed. If you don't care about the order, you can use the recipes shown above. But if you <em>do</em> care about the order, you have to sort the output accordingly.</p> <p>I'll use the following dict to demonstrate how to sort your output in various ways:</p> <pre class="prettyprint"><code>groupdict = {'abc': [1], 'xy': [2, 5]} </code></pre> <p>Keep in mind that this is a bit of a meta-recipe that may need to be combined with other parts of this answer to get exactly the output you want. The general idea is to sort the dictionary keys before using them to extract the values from the dict:</p> <pre class="prettyprint"><code>groups = sorted(groupdict.keys()) # groups = ['abc', 'xy'] </code></pre> <p>Keep in mind that <code>sorted</code> accepts a key function in case you want to customize the sort order. For example, if the dict keys are strings and you want to sort them by length:</p> <pre class="prettyprint"><code>groups = sorted(groupdict.keys(), key=len) # groups = ['xy', 'abc'] </code></pre> <p>Once you've sorted the keys, use them to extract the values from the dict in the correct order:</p> <pre class="prettyprint"><code># groups = ['abc', 'xy'] result = [groupdict[group] for group in groups] # result = [[1], [2, 5]] </code></pre> <p>Remember that this can be combined with other parts of this answer to get different kinds of output. For example, if you want to keep the group identifiers:</p> <pre class="prettyprint"><code># groups = ['abc', 'xy'] result = [(group, groupdict[group]) for group in groups] # result = [('abc', [1]), ('xy', [2, 5])] </code></pre> <p>For your convenience, here are some commonly used sort orders:</p> <ol> <li> <p><strong>Sort by number of values per group:</strong> </p> <pre class="prettyprint"><code> groups = sorted(groudict.keys(), key=lambda group: len(groupdict[group])) result = [groupdict[group] for group in groups] # result = [[2, 5], [1]] </code></pre> </li> </ol> </li> <li> <h3>Counting the number of values in each group</h3> <p>To count the number of elements associated with each group, use the <code>len</code> function:</p> <pre class="prettyprint"><code>result = {group: len(values) for group, values in groupdict.items()} </code></pre> <p>If you want to count the number of <strong>distinct</strong> elements, use <code>set</code> to eliminate duplicates:</p> <pre class="prettyprint"><code>result = {group: len(set(values)) for group, values in groupdict.items()} </code></pre> </li> </ul> <hr> <h3><strong>An example</strong></h3> <p>To demonstrate how to piece together a working solution from this recipe, let's try to turn an input of</p> <pre class="prettyprint"><code>data = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]] </code></pre> <p>into</p> <pre class="prettyprint"><code>result = [["A", "C"], ["B"], ["D", "E"]] </code></pre> <p>In other words, we're grouping lists by their 2nd element.</p> <p>The first two lines of the recipe are always the same, so let's start by copying those:</p> <pre class="prettyprint"><code>import collections groupdict = collections.defaultdict(list) </code></pre> <p>Now we have to find out how to loop over the input. Since our input is a simple list of values, a normal <code>for</code> loop will suffice:</p> <pre class="prettyprint"><code>for value in data: </code></pre> <p>Next we have to extract the group identifier from the value. We're grouping by the 2nd list element, so we use indexing:</p> <pre class="prettyprint"><code> group = value[1] </code></pre> <p>The next step is to transform the value. Since we only want to keep the first element of each list, we once again use list indexing:</p> <pre class="prettyprint"><code> value = value[0] </code></pre> <p>Finally, we have to figure out how to turn the dict we generated into a list. What we want is a list of values, without the groups. We consult the <strong>Output</strong> section of the recipe to find the appropriate dict flattening snippet:</p> <pre class="prettyprint"><code>result = list(groupdict.values()) </code></pre> <p>Et voilà:</p> <pre class="prettyprint"><code>data = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]] import collections groupdict = collections.defaultdict(list) for value in data: group = value[1] value = value[0] groupdict[group].append(value) result = list(groupdict.values()) # result: [["A", "C"], ["B"], ["D", "E"]] </code></pre>

A recipe to group/aggregate data?

Tags:

python

list

grouping

I have some data stored in a list that I would like to group based on a value.

For example, if my data is

data = [(1, 'a'), (2, 'x'), (1, 'b')]

and I want to group it by the first value in each tuple to get

result = [(1, 'ab'), (2, 'x')]

how would I go about it?

More generally, what's the recommended way to group data in python? Is there a recipe that can help me?

640

asked Apr 29 '18 12:04

Aran-Fey

1 Answers

The go-to data structure to use for all kinds of grouping is the dict. The idea is to use something that uniquely identifies a group as the dict's keys, and store all values that belong to the same group under the same key.

As an example, your data could be stored in a dict like this:

{1: ['a', 'b'],
 2: ['x']}

The integer that you're using to group the values is used as the dict key, and the values are aggregated in a list.

The reason why we're using a dict is because it can map keys to values in constant O(1) time. This makes the grouping process very efficient and also very easy. The general structure of the code will always be the same for all kinds of grouping tasks: You iterate over your data and gradually fill a dict with grouped values. Using a defaultdict instead of a regular dict makes the whole process even easier, because we don't have to worry about initializing the dict with empty lists.

import collections

groupdict = collections.defaultdict(list)
for value in data:
    group = value[0]
    value = value[1]
    groupdict[group].append(value)

# result:
# {1: ['a', 'b'],
#  2: ['x']}

Once the data is grouped, all that's left is to convert the dict to your desired output format:

result = [(key, ''.join(values)) for key, values in groupdict.items()]
# result: [(1, 'ab'), (2, 'x')]

The Grouping Recipe

The following section will provide recipes for different kinds of inputs and outputs, and show how to group by various things. The basis for everything is the following snippet:

import collections

groupdict = collections.defaultdict(list)
for value in data:  # input
    group = ???  # group identifier
    value = ???  # value to add to the group
    groupdict[group].append(value)

result = groupdict  # output

Each of the commented lines can/has to be customized depending on your use case.

Input

The format of your input data dictates how you iterate over it.

In this section, we're customizing the for value in data: line of the recipe.

A list of values

More often than not, all the values are stored in a flat list:
```
data = [value1, value2, value3, ...]
```
In this case we simply iterate over the list with a for loop:
```
for value in data:
```
Multiple lists

If you have multiple lists with each list holding the value of a different attribute like
```
firstnames = [firstname1, firstname2, ...]
middlenames = [middlename1, middlename2, ...]
lastnames = [lastname1, lastname2, ...]
```
use the zip function to iterate over all lists simultaneously:
```
for value in zip(firstnames, middlenames, lastnames):
```
This will make value a tuple of (firstname, middlename, lastname).
Multiple dicts or a list of dicts

If you want to combine multiple dicts like
```
dict1 = {'a': 1, 'b': 2}
dict2 = {'b': 5}
```
First put them all in a list:
```
dicts = [dict1, dict2]
```
And then use two nested loops to iterate over all (key, value) pairs:
```
for dict_ in dicts:
    for value in dict_.items():
```
In this case, the value variable will take the form of a 2-element tuple like ('a', 1) or ('b', 2).

Grouping

Here we'll cover various ways to extract group identifiers from your data.

In this section, we're customizing the group = ??? line of the recipe.

Grouping by a list/tuple/dict element

If your values are lists or tuples like (attr1, attr2, attr3, ...) and you want to group them by the nth element:
```
group = value[n]
```
The syntax is the same for dicts, so if you have values like {'firstname': 'foo', 'lastname': 'bar'} and you want to group by the first name:
```
group = value['firstname']
```
Grouping by an attribute

If your values are objects like datetime.date(2018, 5, 27) and you want to group them by an attribute, like year:
```
group = value.year
```
Grouping by a key function

Sometimes you have a function that returns a value's group when it's called. For example, you could use the len function to group values by their length:
```
group = len(value)
```
Grouping by multiple values

If you wish to group your data by more than a single value, you can use a tuple as the group identifier. For example, to group strings by their first letter and their length:
```
group = (value[0], len(value))
```
Grouping by something unhashable

Because dict keys must be hashable, you will run into problems if you try to group by something that can't be hashed. In such a case, you have to find a way to convert the unhashable value to a hashable representation.
1. sets: Convert sets to frozensets, which are hashable:
```
group = frozenset(group)
```
2. dicts: Dicts can be represented as sorted (key, value) tuples:
```
group = tuple(sorted(group.items()))
```

Modifying the aggregated values

Sometimes you will want to modify the values you're grouping. For example, if you're grouping tuples like (1, 'a') and (1, 'b') by the first element, you might want to remove the first element from each tuple to get a result like {1: ['a', 'b']} rather than {1: [(1, 'a'), (1, 'b')]}.

In this section, we're customizing the value = ??? line of the recipe.

No change

If you don't want to change the value in any way, simple delete the value = ??? line from your code.
Keeping only a single list/tuple/dict element

If your values are lists like [1, 'a'] and you only want to keep the 'a':
```
value = value[1]
```
Or if they're dicts like {'firstname': 'foo', 'lastname': 'bar'} and you only want to keep the first name:
```
value = value['firstname']
```
Removing the first list/tuple element

If your values are lists like [1, 'a', 'foo'] and [1, 'b', 'bar'] and you want to discard the first element of each tuple to get a group like [['a', 'foo], ['b', 'bar']], use the slicing syntax:
```
value = value[1:]
```
Removing/Keeping arbitrary list/tuple/dict elements

If your values are lists like ['foo', 'bar', 'baz'] or dicts like {'firstname': 'foo', 'middlename': 'bar', 'lastname': 'baz'} and you want delete or keep only some of these elements, start by creating a set of elements you want to keep or delete. For example:
```
indices_to_keep = {0, 2}
keys_to_delete = {'firstname', 'middlename'}
```
Then choose the appropriate snippet from this list:
1. To keep list elements: value = [val for i, val in enumerate(value) if i in indices_to_keep]
2. To delete list elements: value = [val for i, val in enumerate(value) if i not in indices_to_delete]
3. To keep dict elements: value = {key: val for key, val in value.items() if key in keys_to_keep]
4. To delete dict elements: value = {key: val for key, val in value.items() if key not in keys_to_delete]

Output

Once the grouping is complete, we have a defaultdict filled with lists. But the desired result isn't always a (default)dict.

In this section, we're customizing the result = groupdict line of the recipe.

A regular dict

To convert the defaultdict to a regular dict, simply call the dict constructor on it:
```
result = dict(groupdict)
```
A list of (group, value) pairs

To get a result like [(group1, value1), (group1, value2), (group2, value3)] from the dict {group1: [value1, value2], group2: [value3]}, use a list comprehension:
```
result = [(group, value) for group, values in groupdict.items()
                           for value in values]
```
A nested list of just values

To get a result like [[value1, value2], [value3]] from the dict {group1: [value1, value2], group2: [value3]}, use dict.values:
```
result = list(groupdict.values())
```
A flat list of just values

To get a result like [value1, value2, value3] from the dict {group1: [value1, value2], group2: [value3]}, flatten the dict with a list comprehension:
```
result = [value for values in groupdict.values() for value in values]
```
Flattening iterable values

If your values are lists or other iterables like
```
groupdict = {group1: [[list1_value1, list1_value2], [list2_value1]]}
```
and you want a flattened result like
```
result = {group1: [list1_value1, list1_value2, list2_value1]}
```
you have two options:
1. Flatten the lists with a dict comprehension:
```
result = {group: [x for iterable in values for x in iterable]
                          for group, values in groupdict.items()}
```
2. Avoid creating a list of iterables in the first place, by using list.extend instead of list.append. In other words, change
```
groupdict[group].append(value)
```
  to
```
groupdict[group].extend(value)
```
  And then just set result = groupdict.
A sorted list

Dicts are unordered data structures. If you iterate over a dict, you never know in which order its elements will be listed. If you don't care about the order, you can use the recipes shown above. But if you do care about the order, you have to sort the output accordingly.

I'll use the following dict to demonstrate how to sort your output in various ways:
```
groupdict = {'abc': [1], 'xy': [2, 5]}
```
Keep in mind that this is a bit of a meta-recipe that may need to be combined with other parts of this answer to get exactly the output you want. The general idea is to sort the dictionary keys before using them to extract the values from the dict:
```
groups = sorted(groupdict.keys())
# groups = ['abc', 'xy']
```
Keep in mind that sorted accepts a key function in case you want to customize the sort order. For example, if the dict keys are strings and you want to sort them by length:
```
groups = sorted(groupdict.keys(), key=len)
# groups = ['xy', 'abc']
```
Once you've sorted the keys, use them to extract the values from the dict in the correct order:
```
# groups = ['abc', 'xy']
result = [groupdict[group] for group in groups]
# result = [[1], [2, 5]]
```
Remember that this can be combined with other parts of this answer to get different kinds of output. For example, if you want to keep the group identifiers:
```
# groups = ['abc', 'xy']
result = [(group, groupdict[group]) for group in groups]
# result = [('abc', [1]), ('xy', [2, 5])]
```
For your convenience, here are some commonly used sort orders:
1. Sort by number of values per group:
```
 groups = sorted(groudict.keys(), key=lambda group: len(groupdict[group]))
 result = [groupdict[group] for group in groups]
 # result = [[2, 5], [1]]
```
Counting the number of values in each group

To count the number of elements associated with each group, use the len function:
```
result = {group: len(values) for group, values in groupdict.items()}
```
If you want to count the number of distinct elements, use set to eliminate duplicates:
```
result = {group: len(set(values)) for group, values in groupdict.items()}
```

An example

To demonstrate how to piece together a working solution from this recipe, let's try to turn an input of

data = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]

into

result = [["A", "C"], ["B"], ["D", "E"]]

In other words, we're grouping lists by their 2nd element.

The first two lines of the recipe are always the same, so let's start by copying those:

import collections

groupdict = collections.defaultdict(list)

Now we have to find out how to loop over the input. Since our input is a simple list of values, a normal for loop will suffice:

for value in data:

Next we have to extract the group identifier from the value. We're grouping by the 2nd list element, so we use indexing:

    group = value[1]

The next step is to transform the value. Since we only want to keep the first element of each list, we once again use list indexing:

    value = value[0]

Finally, we have to figure out how to turn the dict we generated into a list. What we want is a list of values, without the groups. We consult the Output section of the recipe to find the appropriate dict flattening snippet:

result = list(groupdict.values())

Et voilà:

data = [["A",0], ["B",1], ["C",0], ["D",2], ["E",2]]

import collections

groupdict = collections.defaultdict(list)
for value in data:
    group = value[1]
    value = value[0]
    groupdict[group].append(value)

result = list(groupdict.values())
# result: [["A", "C"], ["B"], ["D", "E"]]

172

answered Oct 20 '22 22:10

Aran-Fey

Related questions
                            
                                python selenium: WebDriverException: Message: chrome not reachable
                            
                                Why is random video seeks with OpenCV slow?
                            
                                Why PerformanceWarning when indexed lookup on sorted index?
                            
                                Passing datetime-like object to seaborn.lmplot
                            
                                Finding common list sequences
                            
                                Python - OpenDrive Map - Spiral / Clothoid / Euler Spiral / Cornu Spiral Interpolation using Fresnel Integrals
                            
                                brew install python3, but can't link to python3
                            
                                Python - itertools.product without using element more than once
                            
                                Python difference in years between a datetime.now() and a Series filled up with dates?
                            
                                Is it efficient to build a list with a generator function
                            
                                ERROR WHILE RUNNING collect() in PYSPARK
                            
                                Matplotlib while debugging in Pycharm: How to turn off interactive mode?
                            
                                PyQt QtWebChannel: calling Python function from JavaScript
                            
                                Zlib compress in python
                            
                                Incremental code coverage for Python unit tests?
                            
                                How can I check if a network is scale free?
                            
                                How to extract a specific section of an image using OpenCV in Python?
                            
                                Why can't I import LDAPBindError from LDAP3?
                            
                                Python - Datetime format with underscore
                            
                                pandas: TypeError: unhashable type: 'list'

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

A recipe to group/aggregate data?

Tags:

python

list

grouping

Aran-Fey

People also ask

1 Answers

The Grouping Recipe

Input

A list of values

Multiple lists

Multiple dicts or a list of dicts

Grouping

Grouping by a list/tuple/dict element

Grouping by an attribute

Grouping by a key function

Grouping by multiple values

Grouping by something unhashable

Modifying the aggregated values

No change

Keeping only a single list/tuple/dict element

Removing the first list/tuple element

Removing/Keeping arbitrary list/tuple/dict elements

Output

A regular dict

A list of `(group, value)` pairs

A nested list of just values

A flat list of just values

Flattening iterable values

A sorted list

Counting the number of values in each group

An example

Aran-Fey

Recent Activity

Donate For Us

A recipe to group/aggregate data?

Tags:

python

list

grouping

Aran-Fey

People also ask

1 Answers

The Grouping Recipe

Input

A list of values

Multiple lists

Multiple dicts or a list of dicts

Grouping

Grouping by a list/tuple/dict element

Grouping by an attribute

Grouping by a key function

Grouping by multiple values

Grouping by something unhashable

Modifying the aggregated values

No change

Keeping only a single list/tuple/dict element

Removing the first list/tuple element

Removing/Keeping arbitrary list/tuple/dict elements

Output

A regular dict

A list of (group, value) pairs

A nested list of just values

A flat list of just values

Flattening iterable values

A sorted list

Counting the number of values in each group

An example

Aran-Fey

Related questions

Recent Activity

Donate For Us

A list of `(group, value)` pairs