I have a list of length 2016, but only 242 contain data, the rest is set to None. My aim is to interpolate between the values to fill all gaps with a simple form of IDW (inverse distance weighting). So the task of my script is:
Assume we have a smaller list of only 14 items (5 valid ones):
myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79]
resultList = [None] * len(myList)
for i in range(len(myList):
if not myList[i] is None:
resultList[i] = myList[i]
else:
distance = [i - j for j in range(len(myList)) if not myList[j] is None]
neighbors = min([n for n in dist if n>0]), max([n for n in dist if n<0])
# rest of the interpolation (not important for my question):
neighbors_c = [(1/float(n))**2 for n in neighbors]
c_sum = sum(neighbors_c)
neighbors_c = [n/c_sum for n in neighbors_c]
resultList = myList[i-neighbors[0]]*neighbors_c[0] + myList[i-neighbors[1]]*neighbors_c[1]
I am doing that for many many data sets. I found out that this method takes about 0.59sec per data set. What bothers me is the fact that my list is all sorted, but I will only need 2 values from it. So 99% of the distances are calculated for nothing. That led me to attempt two: stop the iteration after i-j becomes negative, because then obviously it ran into the closest values:
So instead of the list comprehension:
distance = [i - j for j in range(len(myList)) if not myList[j] is None]
I do a proper for-loop which I quit after distances pass zero and thus become larger again:
dist = []
for j in range(len(myList)):
if not myList[j] is None:
dist.append(i-j)
if i-j < 0: break
With this method I was able to get down to 0.38sec per data set. When iterating over all items in myList, this second method is quick at the beginning (item is hit after 2nd, 3rd, 4th, ... loop and quit immediately), but there is no improvement for the last items, since iteration always starts at j=0.
I wonder if you can think of any quicker way to find the two neighbors of a specific number in a data set, without having to check ALL distances and only taking the largest negative and smalles positive one.
Also, I'm quite new to python, so please let me know if you find other un-pythonic expressions in my script. Thank you guys very much!
UPDATE:
Here's how to do it with numpy interp
:
import numpy as np
myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79]
values = [(i, val) for i, val in enumerate(myList) if val is not None]
xp, fp = zip(*values)
print(xp) # (0, 4, 7, 9, 13)
print(fp) # (26, 31, 58, 42, 79)
result = np.interp(np.arange(len(myList)), xp, fp)
print(result) # [ 26. 27.25 28.5 29.75 31. 40. 49. 58. 50. 42. 51.25 60.5 69.75 79. ]
Original Post:
As others have already suggested, your be best off with using some interpolation already implemented in numpy or pandas.
However for completeness here's a a quick solution I came up with:
myList = [26, None, None, None, 31, None, None, 58, None, 42, None, None, None, 79]
resultList = []
# first lets split the list into sublists that group the numbers
# and the Nones into groups
for i, item in enumerate(myList):
if i == 0:
resultList.append([item])
else:
if type(resultList[-1][-1]) == type(item):
resultList[-1].append(item)
else:
resultList.append([item])
print(resultList) # [[26], [None, None, None], [31], [None, None], [58], [None], [42], [None, None, None], [79]]
# now lets interpolate the sublists that contain Nones
for i, item in enumerate(resultList):
if item[0] is not None:
continue
# this is a bit problematic, what do we do if we have a None at the beginning or at the end?
if i == 0 or i + 1 == len(resultList):
continue
prev_item = resultList[i - 1][-1]
next_item = resultList[i + 1][0]
difference = next_item - prev_item
item_length = len(item) + 1
for j, none_item in enumerate(item):
item[j] = prev_item + float(j + 1) / item_length * difference
# flatten the list back
resultList = [item for sublist in resultList for item in sublist]
print(resultList) # [26, 27.25, 28.5, 29.75, 31, 40.0, 49.0, 58, 50.0, 42, 51.25, 60.5, 69.75, 79]
I suggest you use this only for learning or for simple cases as it does not handle cases where you have lists beginning or ending with None
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With