Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Addition and += give different results for list (depth first search)

I've been trying to understand depth first search, and using various sources online obtained the following code:

graph = {'A': ['B', 'D'], 'B': ['E'], 'C': [], 'D': ['C', 'E'], 
         'E': ['H', 'I', 'J'], 'F': [], 'G': [], 'H': ['F'], 
         'I': [], 'J': ['G']}

def dfs(graph, start, end, path=None):
    if path == None:
        path = []
    path = path + [start]
    paths = []
    if start == end:
        return [path]
    for neighbour in graph[start]:
        if neighbour not in path:
            paths.extend(dfs(graph, neighbour, end, path))
    return paths

print(dfs(graph, 'A', 'G'))

This outputs the desired result:

[['A', 'B', 'E', 'J', 'G'], ['A', 'D', 'E', 'J', 'G']]

But when I replace the line path = path + [start] with path += [start] (or path.extend([start]), which I understand does the same thing) I get the following output: [['A', 'B', 'E', 'H', 'F', 'I', 'J', 'G', 'D', 'C']]

I know this is to do with a difference in the operations, but I don't really understand how it applies here. Please could someone explain this?

like image 789
HelloWorld4444 Avatar asked Mar 08 '23 05:03

HelloWorld4444


1 Answers

this

path = path + [start]

is slightly different than

path += [start]  # or path.extend([start])

for lists (and other mutable types) because it recreates a new reference of path (implicitly what you want, you don't want to write into the previous reference)

path += [start]  # or path.extend([start])

reuses the same reference because you're passing path many times here in your loop (without making a copy):

for neighbour in graph[start]:
    if neighbour not in path:
       paths.extend(dfs(graph, neighbour, end, path))

so if some other object has it in storage, you're changing both lists: not the same behaviour, not the behaviour you want.

Of course you could do paths.extend(dfs(graph, neighbour, end, path.copy())) but that would be against the principle of least atonishment for the caller (which doesn't expect the last parameter to be modified)

My advice:

  • If you want performance and you don't care about reusing the reference or not, always use += for lists because it just extends the list. The + operator creates a new list and copies, which is really slower.
  • when using path = path + something with a list type, always include a comment for future maintainers (including yourself!) not to optimize that with +=.

maybe some more explicit and equivalent code:

path = path.copy() # or path[:], or list(path)...: force creation of a new reference to break dependency with passed parameter
path += [start]

Also note: it doesn't apply to str or tuple type because even += creates a new reference, because of the string immutability. No risk of sharing references with strings, so using tuple instead of list here would have fixed it as well:

if path == None:
    path = tuple()
path += (start,)  # += creates a new path reference, cos path is a tuple, immutable

(but don't expect more performance with += in that case, copies are made)

like image 112
Jean-François Fabre Avatar answered Apr 20 '23 00:04

Jean-François Fabre