Alright, so I have two lists, as such: <ul> <li>They can and will have overlapping items, for example, <code>[1, 2, 3, 4, 5]</code>, <code>[4, 5, 6, 7]</code>.</li> <li>There will not be additional items in the overlap, for example, this will not happen: <code>[1, 2, 3, 4, 5]</code>, <code>[3.5, 4, 5, 6, 7]</code> </li> <li>The lists are not necessarily ordered nor unique. <code>[9, 1, 1, 8, 7]</code>, <code>[8, 6, 7]</code>.</li> </ul> I want to merge the lists such that existing order is preserved, and to merge at the last possible valid position, and such that no data is lost. Additionally, the first list might be huge. My current working code is as such: <pre class="prettyprint"><code>master = [1,3,9,8,3,4,5] addition = [3,4,5,7,8] def merge(master, addition): n = 1 while n < len(master): if master[-n:] == addition[:n]: return master + addition[n:] n += 1 return master + addition </code></pre> What I would like to know is - is there a more efficient way of doing this? It works, but I'm slightly leery of this, because it can run into large runtimes in my application - I'm merging large lists of strings. EDIT: I'd expect the merge of [1,3,9,8,3,4,5], [3,4,5,7,8] to be: [1,3,9,8,3,4,5,7,8]. For clarity, I've highlighted the overlapping portion. [9, 1, 1, 8, 7], [8, 6, 7] should merge to [9, 1, 1, 8, 7, 8, 6, 7]

You can try the following: <pre class="prettyprint"><code>>>> a = [1, 3, 9, 8, 3, 4, 5] >>> b = [3, 4, 5, 7, 8] >>> matches = (i for i in xrange(len(b), 0, -1) if b[:i] == a[-i:]) >>> i = next(matches, 0) >>> a + b[i:] [1, 3, 9, 8, 3, 4, 5, 7, 8] </code></pre> The idea is we check the first <code>i</code> elements of <code>b</code> (<code>b[:i]</code>) with the last <code>i</code> elements of <code>a</code> (<code>a[-i:]</code>). We take <code>i</code> in decreasing order, starting from the length of <code>b</code> until 1 (<code>xrange(len(b), 0, -1)</code>) because we want to match as much as possible. We take the first such <code>i</code> by using <code>next</code> and if we don't find it we use the zero value (<code>next(..., 0)</code>). From the moment we found the <code>i</code>, we add to <code>a</code> the elements of <code>b</code> from index <code>i</code>.

Pythonic way to merge two overlapping lists, preserving order

Tags:

python

merge

list

python-3.x

Alright, so I have two lists, as such:

They can and will have overlapping items, for example, [1, 2, 3, 4, 5], [4, 5, 6, 7].
There will not be additional items in the overlap, for example, this will not happen: [1, 2, 3, 4, 5], [3.5, 4, 5, 6, 7]
The lists are not necessarily ordered nor unique. [9, 1, 1, 8, 7], [8, 6, 7].

I want to merge the lists such that existing order is preserved, and to merge at the last possible valid position, and such that no data is lost. Additionally, the first list might be huge. My current working code is as such:

master = [1,3,9,8,3,4,5] addition = [3,4,5,7,8]  def merge(master, addition):     n = 1     while n < len(master):         if master[-n:] == addition[:n]:             return master + addition[n:]         n += 1     return master + addition

What I would like to know is - is there a more efficient way of doing this? It works, but I'm slightly leery of this, because it can run into large runtimes in my application - I'm merging large lists of strings.

EDIT: I'd expect the merge of [1,3,9,8,3,4,5], [3,4,5,7,8] to be: [1,3,9,8,3,4,5,7,8]. For clarity, I've highlighted the overlapping portion.

[9, 1, 1, 8, 7], [8, 6, 7] should merge to [9, 1, 1, 8, 7, 8, 6, 7]

454

asked May 05 '15 14:05

Firnagzen

2 Answers

You can try the following:

>>> a = [1, 3, 9, 8, 3, 4, 5] >>> b = [3, 4, 5, 7, 8]  >>> matches = (i for i in xrange(len(b), 0, -1) if b[:i] == a[-i:]) >>> i = next(matches, 0) >>> a + b[i:] [1, 3, 9, 8, 3, 4, 5, 7, 8]

The idea is we check the first i elements of b (b[:i]) with the last i elements of a (a[-i:]). We take i in decreasing order, starting from the length of b until 1 (xrange(len(b), 0, -1)) because we want to match as much as possible. We take the first such i by using next and if we don't find it we use the zero value (next(..., 0)). From the moment we found the i, we add to a the elements of b from index i.

127

answered Sep 28 '22 17:09

JuniorCompressor

There are a couple of easy optimizations that are possible.

You don't need to start at master[1], since the longest overlap starts at master[-len(addition)]
If you add a call to list.index you can avoid creating sub-lists and comparing lists for each index:

This approach keeps the code pretty understandable too (and easier to optimize by using cython or pypy):

master = [1,3,9,8,3,4,5] addition = [3,4,5,7,8]  def merge(master, addition):     first = addition[0]     n = max(len(master) - len(addition), 1)  # (1)     while 1:         try:             n = master.index(first, n)       # (2)         except ValueError:             return master + addition          if master[-n:] == addition[:n]:             return master + addition[n:]         n += 1

answered Sep 28 '22 18:09

thebjorn

Related questions
                            
                                Python logging causing latencies?
                            
                                'collectstatic' command fails when WhiteNoise is enabled
                            
                                Could not find or load the Qt platform plugin "xcb"
                            
                                Python: Splat/unpack operator * in python cannot be used in an expression?
                            
                                Flask API TypeError: Object of type 'Response' is not JSON serializable
                            
                                Install Jupyter Notebook on Miniconda
                            
                                Set Flask environment to development mode as default?
                            
                                Is it possible to do multivariate multi-step forecasting using FB Prophet?
                            
                                Python ctypes: loading DLL from from a relative path
                            
                                python - call instance method using __func__
                            
                                Why should Py_INCREF(Py_None) be required before returning Py_None in C?
                            
                                SyntaxError: invalid token in datetime.datetime(2012,05,22,09,03,41)?
                            
                                Storing a list of strings to a HDF5 Dataset from Python
                            
                                How to write multiple conditions of if-statement in Robot Framework
                            
                                Can I perform multiple assertions in pytest?
                            
                                How to slice a Pandas Dataframe based on datetime index
                            
                                What are differences between List, Dictionary and Tuple in Python? [duplicate]
                            
                                Mechanize and Javascript
                            
                                Reading a Line From File Without Advancing [Pythonic Approach]
                            
                                swift if or/and statement like python

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With