Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: for loop in index assignment

While working through the awesome book "Programming Collective Intelligence", by Toby Segaran, I've encountered some techniques in index assignments I'm not entirely familiar with.

Take this for example:

createkey='_'.join(sorted([str(wi) for wi in wordids]))

or:

normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])

All the nested tuples in the indexes have me a bit confused. What is actually being assigned to these varibles? I assumed obviously the .join one comes out as a string, but what about the latter? If someone could explain the mechanics of these loops I'd really appreciate it. I assume these are pretty common techniques, but being new to Python, I suppose to ask is a moment's shame. Thanks!

like image 719
DeaconDesperado Avatar asked Nov 30 '22 03:11

DeaconDesperado


2 Answers

[str(wi) for wi in wordids]

is a list comprehension.

a = [str(wi) for wi in wordids]

is the same as

a = []
for wi in wordids:
    a.append(str(wi))

So

createkey='_'.join(sorted([str(wi) for wi in wordids]))

creates a list of strings from each item in wordids, then sorts that list and joins it into a big string using _ as a separator.

As agf rightly noted, you can also use a generator expression, which looks just like a list comprehension but with parentheses instead of brackets. This avoids construction of a list if you don't need it later (except for iterating over it). And if you already have parentheses there like in this case with sorted(...) you can simply remove the brackets.

However, in this special case you won't be getting a performance benefit (in fact, it'll be about 10 % slower; I timed it) because sorted() will need to build a list anyway, but it looks a bit nicer:

createkey='_'.join(sorted(str(wi) for wi in wordids))

normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])

iterates through the items of the dictionary linkscores, where each item is a key/value pair. It creates a list of key/l/maxscore tuples and then turns that list back into a dictionary.

However, since Python 2.7, you could also use dict comprehensions:

normalizedscores = {u:float(l)/maxscore for (u,l) in linkscores.items()}

Here's some timing data:

Python 3.2.2

>>> import timeit
>>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
61.37724242267409
>>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
66.01814811313774

Python 2.7.2

>>> import timeit
>>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
58.01728623923137
>>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
60.58927580777687
like image 69
Tim Pietzcker Avatar answered Dec 05 '22 03:12

Tim Pietzcker


Let's take the first one:

  1. str(wi) for wi in wordids takes each element in wordids and converts it to string.
  2. sorted(...) sorts them (lexicographically).
  3. '_'.join(...) merges the sorted word ids into a single string with underscores between entries.

Now the second one:

normalizedscores = dict([(u,float(1)/maxscore) for (u,l) in linkscores.items()])
  1. linkscores is a dictionary (or a dictionary-like object).
  2. for (u,l) in linkscores.items() iterates over all entries in the dictionary, for each entry assigning the key and the value to u and l.
  3. (u,float(1)/maxscore) is a tuple, the first element of which is u and the second element is 1/maxscore (to me, this looks like it might be a typo: float(l)/maxscore would make more sense -- note the lowercase letter el in place of one).
  4. dict(...) constructs a dictionary from the list of tuples, where the first element of each tuple is taken as the key and the second is taken as the value.

In short, it makes a copy of the dictionary, preserving the keys and dividing each value by maxscore.

like image 37
NPE Avatar answered Dec 05 '22 01:12

NPE