Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Expand a dict containing list items into a list of dict pairs

If I have a dictionary containing lists in one or more of its values:

data = {
  'a':0,
  'b':1,
  'c':[0, 1, 2],
  'pair':['one','two']
}

How can I get a list of dict tuples paired by pair and iterating over c, with all else remaining constant? E.g.

output = [
    ({
        'a':0,
        'b':1,
        'c':0,
        'pair':'one'
    },
    {
        'a':0,
        'b':1,
        'c':0,
        'pair':'two'
    }),
    ({
        'a':0,
        'b':1,
        'c':1,
        'pair':'one'
    },
    ...
]
like image 565
r3robertson Avatar asked Jun 24 '18 01:06

r3robertson


4 Answers

Well, this doesn't feel especially elegant, but you might use a nested for loop or list comprehension:

output = []
for i in data['c']:
  output.append(tuple({'a': 0, 'b': 1, 'c': i, 'pair': p} for p in data))

or

output = [tuple({'a': 0, 'b': 1, 'c': i, 'pair': p} for p in data['pair']) for i in data['c']]

A cleaner solution might separate out the generation of the component dict into a function, like this:

def gen_output_dict(c, pair):
  return {'a': 0, 'b': 1, 'c': c, 'pair': pair}

output = []
for i in data['c']:
  output.append(tuple(gen_output_dict(i, p) for p in data['pair']))
like image 88
davidshere Avatar answered Oct 13 '22 22:10

davidshere


You can use itertools.product on list values and keep track of the key from which each element originated. Since the key 'pair' has a special meaning, you should treat it separately.

Code

from itertools import product

def unzip_dict(d):
    keys = [k for k, v in d.items() if isinstance(v, list) and k != 'pair']
    values = [d[k] for k in keys]

    for values in product(*values):
        yield tuple({**d, **dict(zip(keys, values)), 'pair': pair} for pair in d['pair'])

Example

data = {
    'a': 0,
    'c': [1, 2],
    'pair': ['one', 'two']
}

print(*unzip_dict(data))

Output

({'a': 0, 'c': 1, 'pair': 'one'}, {'a': 0, 'c': 1, 'pair': 'two'})
({'a': 0, 'c': 2, 'pair': 'one'}, {'a': 0, 'c': 2, 'pair': 'two'})
like image 4
Olivier Melançon Avatar answered Oct 14 '22 00:10

Olivier Melançon


The following is quite an extended solution:

data = {
  'a':0,
  'b':1,
  'c':[0, 1, 2],
  'pair':['one','two']
}

# Get the length of the longest sequence
length = max(map(lambda x: len(x) if isinstance(x, list) else 1, data.values()))

# Loop through the data and change scalars to sequences
# while also making sure that smaller sequences are stretched to match
# or exceed the length of the longest sequence
for k, v in data.items():
    if isinstance(v, list):
        data[k] = v * int(round(length/len(v), 0))
    else:
        data[k] = [v] * length

# Create a dictionary to keep track of which outputs
# need to end up in which tuple
seen = dict.fromkeys(data.get('pair'), 0)
output = [tuple()] * len(seen)

# Loop through the data and place dictionaries in their
# corresponding tuples.
for v in zip(*data.values()):
        d = dict(zip(data, v))
        output[seen[d.get('pair')]] += (d,)
        seen[d.get('pair')] += 1

print(output)

The idea is to convert the scalars in your data to sequences whose lengths match that of the longest sequence in the original data. Therefore, the first thing I did was assign to the variable length the size of the longest sequence. Armed with that knowledge, we loop through the original data and extend the already existing sequences to match the size of the longest sequence while converting scalars to sequences. Once that's done, we move to generating the output variable. But first, we create a dictionary called seen to help us both create a list of tuples and keep track of which group of dictionaries ends up in which tuple. This, then, allows us to run one final loop to place the groups of dictionaries to their corresponding tuples.

The current output looks like the following:

[({'a': 0, 'b': 1, 'c': 0, 'pair': 'one'},
  {'a': 0, 'b': 1, 'c': 1, 'pair': 'two'}),
 ({'a': 0, 'b': 1, 'c': 2, 'pair': 'one'},)]

Please let me know if you need any more clarifying details. Otherwise, I do hope this serves some purpose.

like image 1
Abdou Avatar answered Oct 13 '22 22:10

Abdou


@r3robertson, You can also try the below code. The code is based on the concept of list comprehension, & deepcopy() operation in Python.

Check Shallow copy vs deepcopy in Python.

import pprint;
import copy;

data = {
    'a': 0,
    'b': 1,
    'c': [0, 1, 2],
    'pair': ['one','two'],
};

def get_updated_dict(data, index, pair_name):
    d = copy.deepcopy(data);
    d.update({'c': index, 'pair': pair_name});
    return d;

output = [tuple(get_updated_dict(data, index, pair_name) for pair_name in data['pair']) for index in data['c']];

# Pretty printing the output list.
pprint.pprint(output, indent=4);

Output »

[   (   {   'a': 0, 'b': 1, 'c': 0, 'pair': 'one'},
        {   'a': 0, 'b': 1, 'c': 0, 'pair': 'two'}),
    (   {   'a': 0, 'b': 1, 'c': 1, 'pair': 'one'},
        {   'a': 0, 'b': 1, 'c': 1, 'pair': 'two'}),
    (   {   'a': 0, 'b': 1, 'c': 2, 'pair': 'one'},
        {   'a': 0, 'b': 1, 'c': 2, 'pair': 'two'})]

Pretty printing using json module »

Note: Tuple will convert into list here as tuples are not supported inside JSON.

import json;
print(json.dumps(output, indent=4));

Output »

[
    [
        {
            "a": 0,
            "c": 0,
            "b": 1,
            "pair": "one"
        },
        {
            "a": 0,
            "c": 0,
            "b": 1,
            "pair": "two"
        }
    ],
    [
        {
            "a": 0,
            "c": 1,
            "b": 1,
            "pair": "one"
        },
        {
            "a": 0,
            "c": 1,
            "b": 1,
            "pair": "two"
        }
    ],
    [
        {
            "a": 0,
            "c": 2,
            "b": 1,
            "pair": "one"
        },
        {
            "a": 0,
            "c": 2,
            "b": 1,
            "pair": "two"
        }
    ]
]
like image 1
hygull Avatar answered Oct 14 '22 00:10

hygull