Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Join lists by value

Is there an efficient way to merge two lists of tuples in python, based on a common value. Currently, I'm doing the following:

name = [
         (9, "John", "Smith"),
         (11, "Bob", "Dobbs"),
         (14, "Joe", "Bloggs")
         ]

occupation = [
              (9, "Builder"),
              (11, "Baker"),
              (14, "Candlestick Maker")
              ]

name_and_job = []

for n in name:
    for o in occupation:
        if n[0] == o[0]:
            name_and_job.append( (n[0], n[1], n[2], o[1]) )


print(name_and_job)

returns:

[(9, 'John', 'Smith', 'Builder'), (11, 'Bob', 'Dobbs', 'Baker'), (14, 'Joe', 'Bloggs', 'Candlestick Maker')]

While this code works perfectly fine for small lists, it's incredibly slow for longer lists with millions of records. Is there a more efficient way to write this?

EDIT The numbers in the first column are unique.

EDIT Modified @John Kugelman's code slightly. Added a get(), just in case the names dictionary doesn't have a matching key in the occupation dictionary:

>>>> names_and_jobs = {id: names[id] + (jobs.get(id),) for id in names}
>>>> print(names_and_jobs)
{9: ('John', 'Smith', None), 11: ('Bob', 'Dobbs', 'Baker'), 14: ('Joe', 'Bloggs', 'Candlestick Maker')}
like image 710
Jesse Reilly Avatar asked Jun 03 '15 22:06

Jesse Reilly


2 Answers

Use dictionaries instead of flat lists.

names = {
    9:  ("John", "Smith"),
    11: ("Bob", "Dobbs"),
    14: ("Joe", "Bloggs")
} 

jobs = {
    9:  "Builder",
    11: "Baker",
    14: "Candlestick Maker"
}

If you need to convert them to this format, you can do:

>>> {id: (first, last) for id, first, last in name}
{9: ('John', 'Smith'), 11: ('Bob', 'Dobbs'), 14: ('Joe', 'Bloggs')}
>>> {id: job for id, job in occupation}
{9: 'Builder', 11: 'Baker', 14: 'Candlestick Maker'}

It'd then be a piece of cake to merge the two.

names_and_jobs = {id: names[id] + (jobs[id],) for id in names}
like image 148
John Kugelman Avatar answered Oct 10 '22 04:10

John Kugelman


from collections import OrderedDict
from itertools import chain

od = OrderedDict()


for ele in chain(name,occupation):
    od.setdefault(ele[0], []).extend(ele[1:])


print([[k]+val for k,val in od.items()])

[[9, 'John', 'Smith', 'Builder'], [11, 'Bob', 'Dobbs', 'Baker'], [14, 'Joe', 'Bloggs', 'Candlestick Maker']]

If you want the data ordered by how it appears in names then you need to use an OrderedDict as normal dicts are unordered.

You can also add the data in the loop creating the desired tuples then just calling od.values to get the list of tuples:

from collections import OrderedDict
from itertools import chain

od = OrderedDict()

for ele in chain(name, occupation):
    k = ele[0]
    if k in od:
        od[k] = od[k] + ele[1:]
    else:
       od[k] = ele

print(od.values())
[(9, 'John', 'Smith', 'Builder'), (11, 'Bob', 'Dobbs', 'Baker'), (14, 'Joe', 'Bloggs', 'Candlestick Maker')]
like image 44
Padraic Cunningham Avatar answered Oct 10 '22 04:10

Padraic Cunningham