For a project I'm working on, I'm implementing a linked-list data-structure, which is based on the idea of a pair, which I define as:
class Pair:
def __init__(self, name, prefs, score):
self.name = name
self.score = score
self.preferences = prefs
self.next_pair = 0
self.prev_pair = 0
where self.next_pair
and self.prev_pair
are pointers to the previous and next links, respectively.
To set up the linked-list, I have an install function that looks like this.
def install(i, pair):
flag = 0
try:
old_pair = pair_array[i]
while old_pair.next_pair != 0:
if old_pair == pair:
#if pair in remainders: remainders.remove(pair)
return 0
if old_pair.score < pair.score:
flag = 1
if old_pair.prev_pair == 0: # we are at the beginning
old_pair.prev_pair = pair
pair.next_pair = old_pair
pair_array[i] = pair
break
else: # we are not at the beginning
pair.prev_pair = old_pair.prev_pair
pair.next_pair = old_pair
old_pair.prev_pair = pair
pair.prev_pair.next_pair = pair
break
else:
old_pair = old_pair.next_pair
if flag==0:
if old_pair == pair:
#if pair in remainders: remainders.remove(pair)
return 0
if old_pair.score < pair.score:
if old_pair.prev_pair==0:
old_pair.prev_pair = pair
pair.next_pair = old_pair
pair_array[i] = pair
else:
pair.prev_pair = old_pair.prev_pair
pair.next_pair = old_pair
old_pair.prev_pair = pair
pair.prev_pair.next_pair = pair
else:
old_pair.next_pair = pair
pair.prev_pair = old_pair
except KeyError:
pair_array[i] = pair
pair.prev_pair = 0
pair.next_pair = 0
Over the course of the program, I am building up a dictionary of these linked-lists, and taking links off of some and adding them in others. Between being pruned and re-installed, the links are stored in an intermediate array.
Over the course of debugging this program, I have come to realize that my understanding of the way Python passes arguments to functions is flawed. Consider this test case I wrote:
def test_install():
p = Pair(20000, [3, 1, 2, 50], 45)
print p.next_pair
print p.prev_pair
parse_and_get(g)
first_run()
rat = len(juggler_array)/len(circuit_array)
pref_size = get_pref_size()
print pref_size
print install(3, p)
print p.next_pair.name
print p.prev_pair
When I run this test, I get the following result.
0
0
10
None
10108
0
What I don't understand is why the second call to p.next_pair
produces a different result (10108
) than the first call (0
). install
does not return a Pair
object that can overwrite the one passed in (it returns None
), and it's not as though I'm passing install
a pointer.
My understanding of call-by-value is that the interpreter copies the values passed into a function, leaving the caller's variables unchanged. For example, if I say
def foo(x):
x = x+1
return x
baz = 2
y = foo(baz)
print y
print baz
Then 3
and 2
should be printed, respectively. And indeed, when I test that out in the Python interpreter, that's what happens.
I'd really appreciate it if anyone can point me in the right direction here.
In Python, everything is an object. Simple assignment stores a reference to the assigned object in the assigned-to name. As a result, it is more straightforward to think of Python variables as names that are assigned to objects, rather than objects that are stored in named locations.
For example:
baz = 2
... stores in baz
a pointer, or reference, to the integer object 2
which is stored elsewhere. (Since the type int
is immutable, Python actually has a pool of small integers and reuses the same 2
object everywhere, but this is an implementation detail that need not concern us much.)
When you call foo(baz)
, foo()
's local variable x
also points to the integer object 2
at first. That is, the foo()
-local name x
and the global name baz
are names for the same object, 2
. Then x = x + 1
is executed. This changes x
to point to a different object: 3
.
It is important to understand: x
is not a box that holds 2
, and 2
is then incremented to 3
. No, x
initially points to 2
and that pointer is then changed to point to 3
. Naturally, since we did not change what object baz
points to, it still points to 2
.
Another way to explain it is that in Python, all argument passing is by value, but all values are references to objects.
A counter-intuitive result of this is that if an object is mutable, it can be modified through any reference and all references will "see" the change. For example, consider this:
baz = [1, 2, 3]
def foo(x):
x[0] = x[0] + 1
foo(baz)
print baz
>>> [2, 2, 3]
This seems very different from our first example. But in reality, the argument is passed the same way. foo()
receives a pointer to baz
under the name x
and then performs an operation on it that changes it (in this case, the first element of the list is pointed to a different int
object). The difference is that the name x
is never pointed to a new object; it is x[0]
that is modified to point to a different object. x
itself still points to the same object as baz
. (In fact, under the hood the assignment to x[0]
becomes a method call: x.__setitem__()
.) Therefore baz
"sees" the modification to the list. How could it not?
You don't see this behavior with integers and strings because you can't change integers or strings; they are immutable types, and when you modify them (e.g. x = x + 1
) you are not actually modifying them but binding your variable name to a completely different object. If you change baz
to a tuple, e.g. baz = (1, 2, 3)
, you will find that foo()
gives you an error because you can`t assign to elements of a tuple; tuples are another immutable type. "Changing" a tuple requires creating a new one, and assignment then points the variable to the new object.
Objects of classes you define are mutable and so your Pair
instance can be modified by any function it is passed into -- that is, attributes may be added, deleted, or reassigned to other objects. None of these things will re-bind any of the names pointing to your object, so all the names that currently point to it will "see" the changes.
Python does not copy anything when passing variables to a function. It is neither call-by-value nor call-by-reference, but of those two it is more similar to call-by-reference. You could think of it as "call-by-value, but the value is a reference".
If you pass a mutable object to a function, then modifying that object inside the function will affect the object everywhere it appears. (If you pass an immutable object to a function, like a string or an integer, then by definition you can't modify the object at all.)
The reason this isn't technically pass-by-reference is that you can rebind a name so that the name refers to something else entirely. (For names of immutable objects, this is the only thing you can do to them.) Rebinding a name that exists only inside a function doesn't affect any names that might exist outside the function.
In your first example with the Pair
objects, you are modifying an object, so you see the effects outside of the function.
In your second example, you are not modifying any objects, you are just rebinding names to other objects (other integers in this case). baz
is a name that points to an integer object (in Python, everything is an object, even integers) with a value of 2. When you pass baz
to foo(x)
, the name x
is created locally inside the foo
function on the stack, and x
is set to the pointer that was passed into the function -- the same pointer as baz
. But x
and baz
are not the same thing, they only contain pointers to the same object. On the x = x+1
line, x
is rebound to point to an integer object with a value of 3, and that pointer is what is returned from the function and used to bind the integer object to y.
If you rewrote your first example to explicitly create a new Pair object inside your function based on the information from the Pair object passed into it (whether this is a copy you then modify, or if you make a constructor that modifies the data on construction) then your function would not have the side-effect of modifying the object that was passed in.
Edit: By the way, in Python you shouldn't use 0
as a placeholder to mean "I don't have a value" -- use None
. And likewise you shouldn't use 0
to mean False
, like you seem to be doing in flag
. But all of 0
, None
and False
evaluate to False
in boolean expressions, so no matter which of those you use, you can say things like if not flag
instead of if flag == 0
.
I suggest that you forget about implementing a linked list, and simply use an instance of a Python list
. If you need something other than the default Python list
, maybe you can use something from a Python module such as collections
.
A Python loop to follow the links in a linked list will run at Python interpreter speed, which is to say, slowly. If you simply use the built-in list
class, your list operations will happen in Python's C code, and you will gain speed.
If you need something like a list but with fast insertion and fast deletion, can you make a dict
work? If there is some sort of ID value (string or integer or whatever) that can be used to impose an ordering on your values, you could just use that as a key value and gain lightning fast insert and delete of values. Then if you need to extract values in order, you can use the dict.keys()
method function to get a list of key values and use that.
But if you really need linked lists, I suggest you find code written and debugged by someone else, and adapt it to your needs. Google search for "python linked list recipe" or "python linked list module".
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With