Join lists by value

Question

Is there an efficient way to merge two lists of tuples in python, based on a common value. Currently, I'm doing the following:

name = [
         (9, "John", "Smith"),
         (11, "Bob", "Dobbs"),
         (14, "Joe", "Bloggs")
         ]

occupation = [
              (9, "Builder"),
              (11, "Baker"),
              (14, "Candlestick Maker")
              ]

name_and_job = []

for n in name:
    for o in occupation:
        if n[0] == o[0]:
            name_and_job.append( (n[0], n[1], n[2], o[1]) )


print(name_and_job)

returns:

[(9, 'John', 'Smith', 'Builder'), (11, 'Bob', 'Dobbs', 'Baker'), (14, 'Joe', 'Bloggs', 'Candlestick Maker')]

While this code works perfectly fine for small lists, it's incredibly slow for longer lists with millions of records. Is there a more efficient way to write this?

EDIT The numbers in the first column are unique.

EDIT Modified @John Kugelman's code slightly. Added a get(), just in case the names dictionary doesn't have a matching key in the occupation dictionary:

>>>> names_and_jobs = {id: names[id] + (jobs.get(id),) for id in names}
>>>> print(names_and_jobs)
{9: ('John', 'Smith', None), 11: ('Bob', 'Dobbs', 'Baker'), 14: ('Joe', 'Bloggs', 'Candlestick Maker')}

John Kugelman · Accepted Answer

Use dictionaries instead of flat lists.

names = {
    9:  ("John", "Smith"),
    11: ("Bob", "Dobbs"),
    14: ("Joe", "Bloggs")
} 

jobs = {
    9:  "Builder",
    11: "Baker",
    14: "Candlestick Maker"
}

If you need to convert them to this format, you can do:

>>> {id: (first, last) for id, first, last in name}
{9: ('John', 'Smith'), 11: ('Bob', 'Dobbs'), 14: ('Joe', 'Bloggs')}
>>> {id: job for id, job in occupation}
{9: 'Builder', 11: 'Baker', 14: 'Candlestick Maker'}

It'd then be a piece of cake to merge the two.

names_and_jobs = {id: names[id] + (jobs[id],) for id in names}

Padraic Cunningham · Answer

from collections import OrderedDict
from itertools import chain

od = OrderedDict()


for ele in chain(name,occupation):
    od.setdefault(ele[0], []).extend(ele[1:])


print([[k]+val for k,val in od.items()])

[[9, 'John', 'Smith', 'Builder'], [11, 'Bob', 'Dobbs', 'Baker'], [14, 'Joe', 'Bloggs', 'Candlestick Maker']]

If you want the data ordered by how it appears in names then you need to use an OrderedDict as normal dicts are unordered.

You can also add the data in the loop creating the desired tuples then just calling od.values to get the list of tuples:

from collections import OrderedDict
from itertools import chain

od = OrderedDict()

for ele in chain(name, occupation):
    k = ele[0]
    if k in od:
        od[k] = od[k] + ele[1:]
    else:
       od[k] = ele

print(od.values())
[(9, 'John', 'Smith', 'Builder'), (11, 'Bob', 'Dobbs', 'Baker'), (14, 'Joe', 'Bloggs', 'Candlestick Maker')]

Join lists by value

Tags:

performance

python

dictionary

list

for-loop

Jesse Reilly

2 Answers

John Kugelman

Padraic Cunningham

Recent Activity

Donate For Us

Join lists by value

Tags:

performance

python

dictionary

list

for-loop

Jesse Reilly

2 Answers

John Kugelman

Padraic Cunningham

Related questions

Recent Activity

Donate For Us