While working through the awesome book "Programming Collective Intelligence", by Toby Segaran, I've encountered some techniques in index assignments I'm not entirely familiar with.
Take this for example:
createkey='_'.join(sorted([str(wi) for wi in wordids]))
or:
normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])
All the nested tuples in the indexes have me a bit confused. What is actually being assigned to these varibles? I assumed obviously the .join
one comes out as a string, but what about the latter? If someone could explain the mechanics of these loops I'd really appreciate it. I assume these are pretty common techniques, but being new to Python, I suppose to ask is a moment's shame. Thanks!
[str(wi) for wi in wordids]
is a list comprehension.
a = [str(wi) for wi in wordids]
is the same as
a = []
for wi in wordids:
a.append(str(wi))
So
createkey='_'.join(sorted([str(wi) for wi in wordids]))
creates a list of strings from each item in wordids
, then sorts that list and joins it into a big string using _
as a separator.
As agf rightly noted, you can also use a generator expression, which looks just like a list comprehension but with parentheses instead of brackets. This avoids construction of a list if you don't need it later (except for iterating over it). And if you already have parentheses there like in this case with sorted(...)
you can simply remove the brackets.
However, in this special case you won't be getting a performance benefit (in fact, it'll be about 10 % slower; I timed it) because sorted()
will need to build a list anyway, but it looks a bit nicer:
createkey='_'.join(sorted(str(wi) for wi in wordids))
normalizedscores = dict([(u,float(l)/maxscore) for (u,l) in linkscores.items()])
iterates through the items of the dictionary linkscores
, where each item is a key/value pair. It creates a list of key/l/maxscore
tuples and then turns that list back into a dictionary.
However, since Python 2.7, you could also use dict comprehensions:
normalizedscores = {u:float(l)/maxscore for (u,l) in linkscores.items()}
Here's some timing data:
Python 3.2.2
>>> import timeit
>>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
61.37724242267409
>>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
66.01814811313774
Python 2.7.2
>>> import timeit
>>> timeit.timeit(stmt="a = '_'.join(sorted([str(x) for x in n]))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
58.01728623923137
>>> timeit.timeit(stmt="a = '_'.join(sorted(str(x) for x in n))", setup="import random; n = [random.randint(0,1000) for i in range(100)]")
60.58927580777687
Let's take the first one:
str(wi) for wi in wordids
takes each element in wordids
and converts it to string.sorted(...)
sorts them (lexicographically).'_'.join(...)
merges the sorted word ids into a single string with underscores between entries.Now the second one:
normalizedscores = dict([(u,float(1)/maxscore) for (u,l) in linkscores.items()])
linkscores
is a dictionary (or a dictionary-like object).for (u,l) in linkscores.items()
iterates over all entries in the dictionary, for each entry assigning the key and the value to u
and l
.(u,float(1)/maxscore)
is a tuple, the first element of which is u
and the second element is 1/maxscore
(to me, this looks like it might be a typo: float(l)/maxscore
would make more sense -- note the lowercase letter el in place of one).dict(...)
constructs a dictionary from the list of tuples, where the first element of each tuple is taken as the key and the second is taken as the value.In short, it makes a copy of the dictionary, preserving the keys and dividing each value by maxscore
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With