Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Most efficient way to find neighbors in list

I have a list of length 2016, but only 242 contain data, the rest is set to None. My aim is to interpolate between the values to fill all gaps with a simple form of IDW (inverse distance weighting). So the task of my script is:

  • Iterate over all items of myList
  • If the myList contains a value (that is not None), just copy it
  • If you find a "None" in myList, get position/value of the left and right neighbor by calculating the distance to all items in myList
  • Calculate an interpolated value for the gap from both neighbors (the farer they are away, the less weight they shall get)

Assume we have a smaller list of only 14 items (5 valid ones):

myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79]
resultList = [None] * len(myList)

for i in range(len(myList):
    if not myList[i] is None:
        resultList[i] = myList[i]
    else:
        distance = [i - j for j in range(len(myList)) if not myList[j] is None]
        neighbors = min([n for n in dist if n>0]), max([n for n in dist if n<0])
        # rest of the interpolation (not important for my question):
        neighbors_c = [(1/float(n))**2 for n in neighbors]
        c_sum = sum(neighbors_c)
        neighbors_c = [n/c_sum for n in neighbors_c]
        resultList = myList[i-neighbors[0]]*neighbors_c[0] + myList[i-neighbors[1]]*neighbors_c[1]

I am doing that for many many data sets. I found out that this method takes about 0.59sec per data set. What bothers me is the fact that my list is all sorted, but I will only need 2 values from it. So 99% of the distances are calculated for nothing. That led me to attempt two: stop the iteration after i-j becomes negative, because then obviously it ran into the closest values:

So instead of the list comprehension:

distance = [i - j for j in range(len(myList)) if not myList[j] is None]

I do a proper for-loop which I quit after distances pass zero and thus become larger again:

dist = []
for j in range(len(myList)):
    if not myList[j] is None:
        dist.append(i-j)
        if i-j < 0: break

With this method I was able to get down to 0.38sec per data set. When iterating over all items in myList, this second method is quick at the beginning (item is hit after 2nd, 3rd, 4th, ... loop and quit immediately), but there is no improvement for the last items, since iteration always starts at j=0.

I wonder if you can think of any quicker way to find the two neighbors of a specific number in a data set, without having to check ALL distances and only taking the largest negative and smalles positive one.

Also, I'm quite new to python, so please let me know if you find other un-pythonic expressions in my script. Thank you guys very much!

like image 286
offeltoffel Avatar asked Dec 14 '15 12:12

offeltoffel


1 Answers

UPDATE: Here's how to do it with numpy interp:

import numpy as np

myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79]

values = [(i, val) for i, val in enumerate(myList) if val is not None]

xp, fp = zip(*values)

print(xp) # (0, 4, 7, 9, 13)
print(fp) # (26, 31, 58, 42, 79)

result = np.interp(np.arange(len(myList)), xp, fp)
print(result) # [ 26.    27.25  28.5   29.75  31.    40.    49.    58.    50.    42.    51.25  60.5   69.75  79.  ]

Original Post:

As others have already suggested, your be best off with using some interpolation already implemented in numpy or pandas.

However for completeness here's a a quick solution I came up with:

myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79]

resultList = []

# first lets split the list into sublists that group the numbers
# and the Nones into groups
for i, item in enumerate(myList):
    if i == 0:
        resultList.append([item])
    else:
        if type(resultList[-1][-1]) == type(item):
            resultList[-1].append(item)
        else:
            resultList.append([item])

print(resultList) # [[26], [None, None, None], [31], [None, None], [58], [None], [42], [None, None, None], [79]]

# now lets interpolate the sublists that contain Nones
for i, item in enumerate(resultList):
    if item[0] is not None:
        continue

    # this is a bit problematic, what do we do if we have a None at the beginning or at the end?
    if i == 0 or i + 1 == len(resultList):
        continue

    prev_item = resultList[i - 1][-1]
    next_item = resultList[i + 1][0]

    difference = next_item - prev_item
    item_length = len(item) + 1

    for j, none_item in enumerate(item):
        item[j] = prev_item + float(j + 1) / item_length * difference

# flatten the list back
resultList = [item for sublist in resultList for item in sublist]

print(resultList) # [26, 27.25, 28.5, 29.75, 31, 40.0, 49.0, 58, 50.0, 42, 51.25, 60.5, 69.75, 79]

I suggest you use this only for learning or for simple cases as it does not handle cases where you have lists beginning or ending with None

like image 140
mirosval Avatar answered Sep 28 '22 13:09

mirosval