Is there an efficient way to merge two lists of tuples in python, based on a common value. Currently, I'm doing the following:
name = [
(9, "John", "Smith"),
(11, "Bob", "Dobbs"),
(14, "Joe", "Bloggs")
]
occupation = [
(9, "Builder"),
(11, "Baker"),
(14, "Candlestick Maker")
]
name_and_job = []
for n in name:
for o in occupation:
if n[0] == o[0]:
name_and_job.append( (n[0], n[1], n[2], o[1]) )
print(name_and_job)
returns:
[(9, 'John', 'Smith', 'Builder'), (11, 'Bob', 'Dobbs', 'Baker'), (14, 'Joe', 'Bloggs', 'Candlestick Maker')]
While this code works perfectly fine for small lists, it's incredibly slow for longer lists with millions of records. Is there a more efficient way to write this?
EDIT The numbers in the first column are unique.
EDIT Modified @John Kugelman's code slightly. Added a get(), just in case the names dictionary doesn't have a matching key in the occupation dictionary:
>>>> names_and_jobs = {id: names[id] + (jobs.get(id),) for id in names}
>>>> print(names_and_jobs)
{9: ('John', 'Smith', None), 11: ('Bob', 'Dobbs', 'Baker'), 14: ('Joe', 'Bloggs', 'Candlestick Maker')}
Use dictionaries instead of flat lists.
names = {
9: ("John", "Smith"),
11: ("Bob", "Dobbs"),
14: ("Joe", "Bloggs")
}
jobs = {
9: "Builder",
11: "Baker",
14: "Candlestick Maker"
}
If you need to convert them to this format, you can do:
>>> {id: (first, last) for id, first, last in name}
{9: ('John', 'Smith'), 11: ('Bob', 'Dobbs'), 14: ('Joe', 'Bloggs')}
>>> {id: job for id, job in occupation}
{9: 'Builder', 11: 'Baker', 14: 'Candlestick Maker'}
It'd then be a piece of cake to merge the two.
names_and_jobs = {id: names[id] + (jobs[id],) for id in names}
from collections import OrderedDict
from itertools import chain
od = OrderedDict()
for ele in chain(name,occupation):
od.setdefault(ele[0], []).extend(ele[1:])
print([[k]+val for k,val in od.items()])
[[9, 'John', 'Smith', 'Builder'], [11, 'Bob', 'Dobbs', 'Baker'], [14, 'Joe', 'Bloggs', 'Candlestick Maker']]
If you want the data ordered by how it appears in names then you need to use an OrderedDict as normal dicts are unordered.
You can also add the data in the loop creating the desired tuples then just calling od.values to get the list of tuples:
from collections import OrderedDict
from itertools import chain
od = OrderedDict()
for ele in chain(name, occupation):
k = ele[0]
if k in od:
od[k] = od[k] + ele[1:]
else:
od[k] = ele
print(od.values())
[(9, 'John', 'Smith', 'Builder'), (11, 'Bob', 'Dobbs', 'Baker'), (14, 'Joe', 'Bloggs', 'Candlestick Maker')]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With