Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python generator objects and .join

Just a fundamental question regarding python and .join() method:

file1 = open(f1,"r")
file2 = open(f2,"r")
file3 = open("results","w")

diff = difflib.Differ()
result = diff.compare(file1.read(),file2.read())
file3.write("".join(result)),

The above snippet of code yields a nice output stored in a file called "results", in string format, showing the differences between the two files line-by-line. However I notice that if I just print "result" without using .join(), the compiler returns a message that includes a memory address. After trying to write the result to the file without using .join(), I was informed by the compiler that only strings and character buffers may be used in the .join() method, and not generator objects. So based off of all the evidence that I have adduced, please correct me if I am wrong:

  1. result = diff.compare(file1.read(),file2.read()) <---- result is a generator object?

  2. result is a list of strings, with result itself being the reference to the first string?

  3. .join() takes a memory address and points to the first, and then iterates over the rest of the addresses of strings in that structure?

  4. A generator object is an object that returns a pointer?

I apologize if my questions are unclear, but I basically wanted to ask the python veterans if my deductions were correct. My question is less about the observable results, and more so about the inner workings of python. I appreciate all of your help.

like image 596
eazar001 Avatar asked Jan 21 '13 20:01

eazar001


People also ask

What is a Python generator object?

Python generators are a simple way of creating iterators. All the work we mentioned above are automatically handled by generators in Python. Simply speaking, a generator is a function that returns an object (iterator) which we can iterate over (one value at a time).

What are iterators and generators in Python?

Iterators are the objects that use the next() method to get the next value of the sequence. A generator is a function that produces or yields a sequence of values using a yield statement. Classes are used to Implement the iterators. Functions are used to implement the generator.

How do you access the generator object in Python?

You need to call next() or loop through the generator object to access the values produced by the generator expression. When there isn't the next value in the generator object, a StopIteration exception is thrown. A for loop can be used to iterate the generator object.


1 Answers

join is a method of strings. That method takes any iterable and iterates over it and joins the contents together. (The contents have to be strings, or it will raise an exception.)

If you attempt to write the generator object directly to the file, you will just get the generator object itself, not its contents. join "unrolls" the contents of the generator.

You can see what is going with a simple, explicit generator:

def gen():
    yield 'A'
    yield 'B'
    yield 'C'

>>> g = gen()
>>> print g
<generator object gen at 0x0000000004BB9090>
>>> print ''.join(g)
ABC

The generator doles out its contents one at a time. If you try to look at the generator itself, it doesn't dole anything out and you just see it as "generator object". To get at its contents, you need to iterate over them. You can do this with a for loop, with the next function, or with any of various other functions/methods that iterate over things (str.join among them).

When you say that result "is a list of string" you are getting close to the idea. A generator (or iterable) is sort of like a "potential list". Instead of actually being a list of all its contents all at once, it lets you peel off each item one at a time.

None of the objects is a "memory address". The string representation of a generator object (like that of many other objects) includes a memory address, so if you print it (as above) or write it to a file, you'll see that address. But that doesn't mean that object "is" that memory address, and the address itself isn't really usable as such. It's just a handy identifying tag so that if you have multiple objects you can tell them apart.

like image 102
BrenBarn Avatar answered Sep 22 '22 08:09

BrenBarn