Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

A star algorithm: Distance heuristics

I am using the A star algorithm as seen here (taken from http://code.activestate.com/recipes/578919-python-a-pathfinding-with-binary-heap/), but I have a problem I don't understand.

The heuristics given here is the square of the distance between two points. I found that if I take the square root of it instead, my results are more accurate, but the running time of the function increases dramatically (i.e. it goes over many, many more loops than before).

Why does the change in heuristics cause it to be more accurate, and take longer to run?

# Author: Christian Careaga ([email protected])
# A* Pathfinding in Python (2.7)
# Please give credit if used

import numpy
from heapq import *


def heuristic(a, b):
    return (b[0] - a[0]) ** 2 + (b[1] - a[1]) ** 2

def astar(array, start, goal):

    neighbors = [(0,1),(0,-1),(1,0),(-1,0),(1,1),(1,-1),(-1,1),(-1,-1)]

    close_set = set()
    came_from = {}
    gscore = {start:0}
    fscore = {start:heuristic(start, goal)}
    oheap = []

    heappush(oheap, (fscore[start], start))

    while oheap:

        current = heappop(oheap)[1]

        if current == goal:
            data = []
            while current in came_from:
                data.append(current)
                current = came_from[current]
            return data

        close_set.add(current)
        for i, j in neighbors:
            neighbor = current[0] + i, current[1] + j            
            tentative_g_score = gscore[current] + heuristic(current, neighbor)
            if 0 <= neighbor[0] < array.shape[0]:
                if 0 <= neighbor[1] < array.shape[1]:                
                    if array[neighbor[0]][neighbor[1]] == 1:
                        continue
                else:
                    # array bound y walls
                    continue
            else:
                # array bound x walls
                continue

            if neighbor in close_set and tentative_g_score >= gscore.get(neighbor, 0):
                continue

            if  tentative_g_score < gscore.get(neighbor, 0) or neighbor not in [i[1]for i in oheap]:
                came_from[neighbor] = current
                gscore[neighbor] = tentative_g_score
                fscore[neighbor] = tentative_g_score + heuristic(neighbor, goal)
                heappush(oheap, (fscore[neighbor], neighbor))

    return False

'''Here is an example of using my algo with a numpy array,
   astar(array, start, destination)
   astar function returns a list of points (shortest path)'''

nmap = numpy.array([
    [0,0,0,0,0,0,0,0,0,0,0,0,0,0],
    [1,1,1,1,1,1,1,1,1,1,1,1,0,1],
    [0,0,0,0,0,0,0,0,0,0,0,0,0,0],
    [1,0,1,1,1,1,1,1,1,1,1,1,1,1],
    [0,0,0,0,0,0,0,0,0,0,0,0,0,0],
    [1,1,1,1,1,1,1,1,1,1,1,1,0,1],
    [0,0,0,0,0,0,0,0,0,0,0,0,0,0],
    [1,0,1,1,1,1,1,1,1,1,1,1,1,1],
    [0,0,0,0,0,0,0,0,0,0,0,0,0,0],
    [1,1,1,1,1,1,1,1,1,1,1,1,0,1],
    [0,0,0,0,0,0,0,0,0,0,0,0,0,0]])

print astar(nmap, (0,0), (10,13))
like image 893
johannes Avatar asked Oct 18 '22 03:10

johannes


1 Answers

Why does the change in heuristics cause it to be more accurate, and take longer to run?

The first heuristic, distance squared, overestimates the real distance (by a lot, depending on the situation) even though the actual distance is computed the same way, because the actual distance is computed as the sum of single steps (the sum of squares is less than the square of the sum). A* tends to respond to that by not exploring enough to guarantee finding the optimal route, it prefers to just keep following whatever route it is trying because taking a step closer to the goal reduces the expected distance to it by a lot (by much more than the step itself takes, because the heuristic overestimates so much). It will often not back up (ie take some previously-opened node from the queue instead of the most-recent node) and try something else, because going back means the H value goes up more than the G value goes down.

So this has the two effects you saw:

  1. It's usually a lot faster (except in certain mazes where you can "trick" the algorithm into going the wrong way for longer than it otherwise would have)
  2. It does not necessarily find the best route

Your connectivity is the 8-neighbourhood, for which there is a better heuristic than Euclidean distance. Note that a path cannot have an arbitrary angle, it has to go straight or at a 45 degree angle, so Euclidean distance underestimates the distance even in the absence of obstacles. That is OK for correctness, but you can use the "diagonal distance" heuristic: (taken from here and easy to adapt to Python - that site also discusses the impact of having an overestimating heuristic)

function heuristic(node) =
    dx = abs(node.x - goal.x)
    dy = abs(node.y - goal.y)
    return D * (dx + dy) + (D2 - 2 * D) * min(dx, dy)

You would have D = 1, D2 = sqrt(2) to match your Euclidean distance metric.


There are some techniques that can be used to re-use some work if multiple paths share a source or destination (it doesn't matter which since it's symmetric anyway). For example, as you search from A to B, the G scores can be stored in a grid (they can even be left out of the nodes). Then on a search for a path towards A, those saved G scores represent the real distances to A. Obviously these could be used in order to have a perfect heuristic, but there is an even faster use: if a node that used such a perfect heuristic to calculate its F is drawn from the queue, then the shortest path is definitely through that node (since its F is the actual length of the path, and it is apparently the shortest since it came off of the priority queue), and moreover, you know the path with no further search (greedily follow the saved G scores back to A).

What this leads to is that every search for a path builds up information that can be used for an other search in the other direction. That search in the other direction then builds up information again for searches in that other direction and so forth. Some care should be taken - it is very easy to let the memory use explode. Likely not all information can be kept around.

This can probably be combined with Jump Point Search as well though there would be fewer G's to save so it's probably not very effective, mostly wasting a bunch of space.

like image 58
harold Avatar answered Oct 20 '22 21:10

harold