Just a fundamental question regarding python and .join() method:
file1 = open(f1,"r")
file2 = open(f2,"r")
file3 = open("results","w")
diff = difflib.Differ()
result = diff.compare(file1.read(),file2.read())
file3.write("".join(result)),
The above snippet of code yields a nice output stored in a file called "results", in string format, showing the differences between the two files line-by-line. However I notice that if I just print "result" without using .join(), the compiler returns a message that includes a memory address. After trying to write the result to the file without using .join(), I was informed by the compiler that only strings and character buffers may be used in the .join() method, and not generator objects. So based off of all the evidence that I have adduced, please correct me if I am wrong:
result = diff.compare(file1.read(),file2.read())
<---- result is a generator object?
result
is a list of strings, with result
itself being the reference to the first string?
.join()
takes a memory address and points to the first, and then iterates over the rest of the addresses of strings in that structure?
A generator object is an object that returns a pointer?
I apologize if my questions are unclear, but I basically wanted to ask the python veterans if my deductions were correct. My question is less about the observable results, and more so about the inner workings of python. I appreciate all of your help.
Python generators are a simple way of creating iterators. All the work we mentioned above are automatically handled by generators in Python. Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).
Iterators are the objects that use the next() method to get the next value of the sequence. A generator is a function that produces or yields a sequence of values using a yield statement. Classes are used to Implement the iterators. Functions are used to implement the generator.
You need to call next() or loop through the generator object to access the values produced by the generator expression. When there isn't the next value in the generator object, a StopIteration exception is thrown. A for loop can be used to iterate the generator object.
join
is a method of strings. That method takes any iterable and iterates over it and joins the contents together. (The contents have to be strings, or it will raise an exception.)
If you attempt to write the generator object directly to the file, you will just get the generator object itself, not its contents. join
"unrolls" the contents of the generator.
You can see what is going with a simple, explicit generator:
def gen():
yield 'A'
yield 'B'
yield 'C'
>>> g = gen()
>>> print g
<generator object gen at 0x0000000004BB9090>
>>> print ''.join(g)
ABC
The generator doles out its contents one at a time. If you try to look at the generator itself, it doesn't dole anything out and you just see it as "generator object". To get at its contents, you need to iterate over them. You can do this with a for
loop, with the next
function, or with any of various other functions/methods that iterate over things (str.join
among them).
When you say that result "is a list of string" you are getting close to the idea. A generator (or iterable) is sort of like a "potential list". Instead of actually being a list of all its contents all at once, it lets you peel off each item one at a time.
None of the objects is a "memory address". The string representation of a generator object (like that of many other objects) includes a memory address, so if you print it (as above) or write it to a file, you'll see that address. But that doesn't mean that object "is" that memory address, and the address itself isn't really usable as such. It's just a handy identifying tag so that if you have multiple objects you can tell them apart.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With