I've been trying to understand depth first search, and using various sources online obtained the following code:
graph = {'A': ['B', 'D'], 'B': ['E'], 'C': [], 'D': ['C', 'E'],
'E': ['H', 'I', 'J'], 'F': [], 'G': [], 'H': ['F'],
'I': [], 'J': ['G']}
def dfs(graph, start, end, path=None):
if path == None:
path = []
path = path + [start]
paths = []
if start == end:
return [path]
for neighbour in graph[start]:
if neighbour not in path:
paths.extend(dfs(graph, neighbour, end, path))
return paths
print(dfs(graph, 'A', 'G'))
This outputs the desired result:
[['A', 'B', 'E', 'J', 'G'], ['A', 'D', 'E', 'J', 'G']]
But when I replace the line path = path + [start]
with path += [start]
(or path.extend([start])
, which I understand does the same thing) I get the following output: [['A', 'B', 'E', 'H', 'F', 'I', 'J', 'G', 'D', 'C']]
I know this is to do with a difference in the operations, but I don't really understand how it applies here. Please could someone explain this?
this
path = path + [start]
is slightly different than
path += [start] # or path.extend([start])
for lists (and other mutable types) because it recreates a new reference of path
(implicitly what you want, you don't want to write into the previous reference)
path += [start] # or path.extend([start])
reuses the same reference because you're passing path
many times here in your loop (without making a copy):
for neighbour in graph[start]:
if neighbour not in path:
paths.extend(dfs(graph, neighbour, end, path))
so if some other object has it in storage, you're changing both lists: not the same behaviour, not the behaviour you want.
Of course you could do paths.extend(dfs(graph, neighbour, end, path.copy()))
but that would be against the principle of least atonishment for the caller (which doesn't expect the last parameter to be modified)
My advice:
+=
for lists because it just extends the list. The +
operator creates a new list and copies, which is really slower.path = path + something
with a list
type, always include a comment for future maintainers (including yourself!) not to optimize that with +=
.maybe some more explicit and equivalent code:
path = path.copy() # or path[:], or list(path)...: force creation of a new reference to break dependency with passed parameter
path += [start]
Also note: it doesn't apply to str
or tuple
type because even +=
creates a new reference, because of the string immutability. No risk of sharing references with strings, so using tuple
instead of list
here would have fixed it as well:
if path == None:
path = tuple()
path += (start,) # += creates a new path reference, cos path is a tuple, immutable
(but don't expect more performance with +=
in that case, copies are made)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With