I have the following code that works fine and I was wondering how to implement the same logic using list comprehension.
def get_features(document, feature_space):
features = {}
for w in feature_space:
features[w] = (w in document)
return features
Also am I going to get any improvements in performance by using a list comprehension?
The thing is that both feature_space and document are relatively big and many iterations will run.
Edit: Sorry for not making it clear at first, both feature_space and document are lists.
document is a list of words (a word may exist more than once!)feature_space is a list of labels (features)Like this, with a dict comprehension:
def get_features(document, feature_space):
return {w: (w in document) for w in feature_space}
The features[key] = value expression becomes the key: value part at the start, and the rest of the for loop(s) and any if statements follow in nesting order.
Yes, this will give you a performance boost, because you've now removed all features local name lookups and the dict.__setitem__ calls.
Note that you need to make sure that document is a data structure that has fast membership tests. If it is a list, convert it to a set() first, for example, to ensure that membership tests take O(1) (constant) time, not the O(n) linear time of a list:
def get_features(document, feature_space):
document = set(document)
return {w: (w in document) for w in feature_space}
With a set, this is now a O(K) loop instead of a O(KN) loop (where N is the size of document, K the size of feature_space).
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With