Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding k closest numbers to a given number

Tags:

python

closest

Say I have a list [1,2,3,4,5,6,7]. I want to find the 3 closest numbers to, say, 6.5. Then the returned value would be [5,6,7].

Finding one closest number is not that tricky in python, which can be done using

min(myList, key=lambda x:abs(x-myNumber))

But I am trying not to put a loop around this to find k closest numbers. Is there a pythonic way to achieve the above task?

like image 641
frazman Avatar asked Jun 09 '14 00:06

frazman


People also ask

How do I find the nearest number in an array?

The array may contain duplicate values and negative numbers. So if the array is like [2, 5, 6, 7, 8, 8, 9] and the target number is 4, then closest element is 5. We can solve this by traversing through the given array and keep track of absolute difference of current element with every element.

How do I find the nearest number in C#?

Or you could just use: var nearest = array. OrderBy(x => Math. Abs((long) x - targetNumber)).


3 Answers

The short answer

The heapq.nsmallest() function will do this neatly and efficiently:

>>> from heapq import nsmallest >>> s = [1,2,3,4,5,6,7] >>> nsmallest(3, s, key=lambda x: abs(x - 6.5)) [6, 7, 5] 

Essentially this says, "Give me the three input values that have the smallest absolute difference from the number 6.5".

Optimizing for repeated lookups

In the comments, @Phylliida, asked how to optimize for repeated lookups with differing start points. One approach would be to pre-sort the data and then use bisect to locate the center of a small search segment:

from bisect import bisect  def k_nearest(k, center, sorted_data):     'Return *k* members of *sorted_data* nearest to *center*'     i = bisect(sorted_data, center)     segment = sorted_data[max(i-k, 0) : i+k]     return nsmallest(k, segment, key=lambda x: abs(x - center)) 

For example:

>>> s.sort() >>> k_nearest(3, 6.5, s) [6, 7, 5] >>> k_nearest(3, 0.5, s) [1, 2, 3] >>> k_nearest(3, 4.5, s)     [4, 5, 3] >>> k_nearest(3, 5.0, s) [5, 4, 6] 
like image 154
Raymond Hettinger Avatar answered Sep 28 '22 23:09

Raymond Hettinger


You could compute distances, and sort:

[n for d, n in sorted((abs(x-myNumber), x) for x in myList)[:k]] 

This does the following:

  1. Create a sequence of tuples (d, x) where d is the distance to your target
  2. Select the first k elements of that list
  3. Extract just the number values from the result, discarding the distance
like image 25
Greg Hewgill Avatar answered Sep 28 '22 23:09

Greg Hewgill


Both answers were good, and Greg was right, Raymond's answer is more high level and easier to implement, but I built upon Greg's answer because it was easier to manipulate to fit my need.

In case anyone is searching for a way to find the n closest values from a list of dicts.

My dict looks like this, where npi is just an identifier that I need along with the value:

mydict = {u'fnpi': u'1982650024',
 u'snpi': {u'npi': u'1932190360', u'value': 2672},
 u'snpis': [{u'npi': u'1831289255', u'value': 20},
  {u'npi': u'1831139799', u'value': 20},
  {u'npi': u'1386686137', u'value': 37},
  {u'npi': u'1457355257', u'value': 45},
  {u'npi': u'1427043645', u'value': 53},
  {u'npi': u'1477548675', u'value': 53},
  {u'npi': u'1851351514', u'value': 57},
  {u'npi': u'1366446171', u'value': 60},
  {u'npi': u'1568460640', u'value': 75},
  {u'npi': u'1326046673', u'value': 109},
  {u'npi': u'1548281124', u'value': 196},
  {u'npi': u'1912989989', u'value': 232},
  {u'npi': u'1336147685', u'value': 284},
  {u'npi': u'1801894142', u'value': 497},
  {u'npi': u'1538182779', u'value': 995},
  {u'npi': u'1932190360', u'value': 2672},
  {u'npi': u'1114020336', u'value': 3264}]}

value = mydict['snpi']['value'] #value i'm working with below
npi = mydict['snpi']['npi'] #npi (identifier) i'm working with below
snpis = mydict['snpis'] #dict i'm working with below

To get an [id, value] list (not just a list of values) , I use this:

[[id,val] for diff, val, id in sorted((abs(x['value']-value), x['value'], x['npi']) for x in snpis)[:6]]

Which produces this:

[[u'1932190360', 2672],
 [u'1114020336', 3264],
 [u'1538182779', 995],
 [u'1801894142', 497],
 [u'1336147685', 284],
 [u'1912989989', 232]]

EDIT

I actually found it pretty easy to manipulate Raymond's answer too, if you're dealing with a dict (or list of lists).

from heapq import nsmallest
[[i['npi'], i['value']] for i in nsmallest(6, snpis, key=lambda x: abs(x['value']-value))]

This will produce the same as the above output.

And this

nsmallest(6, snpis, key=lambda x: abs(x['value']-value)) will produce a dict instead.

like image 34
tmthyjames Avatar answered Sep 28 '22 23:09

tmthyjames