Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Efficiently insert multiple elements in a list (or another data structure) keeping their order

I have a list of items that should be inserted in a list-like data structure one after the other, and I have the indexes at which each item should be inserted. For example:

items = ['itemX', 'itemY', 'itemZ']
indexes = [0, 0, 1]

The expected result is to have a list like this: result = ['itemY', 'itemZ', 'itemX'].

I'm able to get this result with this simple approach:

result = []
for index, item in zip(indexes, items):
    result.insert(index, item)

However, this is a very slow approach once lists become huge (the complexity is O(n^2)). Is there any (relatively simple to implement) way to improve my basic approach? I guess I have to look at other data structures while I insert elements and finally transform that data structure into my result list. Are trees a good option? Insert could be done maybe in O(log(n)) (instead of O(n)), but which specific tree-like structure should I use?

Or maybe something good can be achieved by just looking at all the indexes together (instead of using them one by one).

This is probably the worst case for my slow approach (always insert items at the beginning of the list):

n = 10**6 # some large number
items = list(range(n))
indexes = [0] * n
like image 303
Riccardo Bucco Avatar asked Aug 26 '21 13:08

Riccardo Bucco


People also ask

Which of the method is used to insert multiple elements in list collection?

extend() extend() can add multiple items to a list.

What data types can be stored in a list Python?

A Python list may contain different types! Indeed, you can store a number, a string, and even another list within a single list.

How do you store 3 values in Python?

Or, more general solution, create a list containing your 3 variables : my_3_variables = [title, author, date] and then put this list at the first position of your other list : my_list[0] = my_3_variables .


2 Answers

Here's python code for a treap with a size decoration that allows insertion at specific indexes, and reordering of whole contiguous sections. It was adapted from C++ code, Kimiyuki Onaka's solution to the Hackerrank problem, "Give Me the Order." (I cannot guarantee that this adaptation is bug free -- a copy of the original code is available in the description of this question.)

import random

class Treap:
  def __init__(self, value=None):
    self.value = value
    self.key = random.random()
    self.size = 1
    self.left = None
    self.right = None


def size(t):
  return t.size if t else 0


def update(t):
  if t:
    t.size = 1 + size(t.left) + size(t.right)
  return t


def merge(a, b):
  if not a:
    return b
  if not b:
    return a

  if a.key > b.key:
    a.right = merge(a.right, b)
    return update(a)
  else:
    b.left = merge(a, b.left)
    return update(b)


def split(t, i):
  if not t:
    return None, None

  if i <= size(t.left):
    u, t.left = split(t.left, i)
    return u, update(t)
  else:
    t.right, u = split(t.right, i - size(t.left) - 1)
    return update(t), u


def insert(t, i, value):
  left, right = split(t, i)
  u = Treap(value)
  return merge(merge(left, u), right)


def inorder(treap):
  if not treap:
    return

  if treap.left:
    inorder(treap.left)

  print(treap.value)

  if treap.right:
    inorder(treap.right)

Output:

lst = ['itemX', 'itemY', 'itemZ']
idxs = [0, 0, 1]

t = None

for i in range(len(lst)):
  t = insert(t, idxs[i], lst[i])

inorder(t)

"""
itemY
itemZ
itemX
"""
like image 172
גלעד ברקן Avatar answered Oct 11 '22 01:10

גלעד ברקן


You could use SortedList, neutralize its sorting with a constant key function, and only use it for its fast insertions. Needs version 1.5.10 or older, as insert got removed.

def insertions(indexes, items):
    tmp = SortedList(key=lambda _: 0)
    for index, item in zip(indexes, items):
        tmp.insert(index, item)
    return list(tmp)

(I imagine there's also something like it but without sorting that needs to be neutralized, sortedcontainers is just something I know.)

Benchmark results:

               indexes =   [0] * 10**6     [randint(0, i) for i in range(10**6)]
--------------------------------------------------------------------------------
original                  1540 seconds     759 seconds
neutralized SortedList      13 seconds      31 seconds
sorted mediants            201 seconds     249 seconds
sorted mediants optimized   42 seconds      72 seconds

Those last two solutions are another idea:

Use a SortedList the normal way, but annotate each item with a fraction from 0 to 1 (and order by that). To insert between two items, use those items' mediant.

from sortedcontainers import SortedList
from fractions import Fraction

def insertions(indexes, items):
    xs = SortedList([(Fraction(0), None), (Fraction(1), None)])
    for index, item in zip(indexes, items):
        a, c = xs[index][0].as_integer_ratio()
        b, d = xs[index + 1][0].as_integer_ratio()
        xs.add((Fraction(a+b, c+d), item))
    return [item for _, item in xs[1:-1]]

Optimized version doing fractions myself:

from sortedcontainers import SortedList

class X(tuple):
    def __lt__(self, other):
        return self[0] * other[1] < self[1] * other[0]

def insertions(indexes, items):
    xs = SortedList([X((0, 1, None)), X((1, 1, None))])
    for index, item in zip(indexes, items):
        L, R = xs[index : index+2]
        xs.add(X((L[0] + R[0], L[1] + R[1], item)))
    return [x[2] for x in xs[1:-1]]
like image 44
no comment Avatar answered Oct 11 '22 03:10

no comment