I have a huge array of labels which I make unique via:
unique_train_labels = set(train_property_labels)
Which prints out as set([u'A', u'B', u'C']). I want to create a new set of unique labels with a new label called "no_region", and am using:
unique_train_labels_threshold = unique_train_labels.add('no_region')
However, this prints out to be None.
My ultimate aim is to use these unique labels to later generate a random array of categorical labels via:
rng = np.random.RandomState(101)
categorical_random = rng.choice(list(unique_train_labels), len(finalTestSentences))
categorical_random_threshold = rng.choice(list(unique_train_labels_threshold), len(finalTestSentences))
From the docs it says that set.add() should generate a new set, which seems not to be the case (hence I can't later call list(unique_train_labels_threshold))
As mentioned in Moses' answer, the set.add method mutates the original set, it does not create a new set. In Python it's conventional for methods that perform in-place mutation to return None; the methods of all built-in mutable types do that, and the convention is generally observed by 3rd-party libraries.
An alternative to using the .copy method is to use the .union method, which returns a new set that is the union of the original set and the set supplied as an argument. For sets, the | or operator invokes the .union method.
a = {1, 2, 3}
b = a.union({5})
c = a | {4}
print(a, b, c)
output
{1, 2, 3} {1, 2, 3, 5} {1, 2, 3, 4}
The .union method (like other set methods that can be invoked via operator syntax) has a slight advantage over the operator syntax: you can pass it any iterable for its argument; the operator version requires you to explicitly convert the argument to a set (or frozenset).
a = {1, 2, 3}
b = a.union([5, 6])
c = a | set([7, 8])
print(a, b, c)
output
{1, 2, 3} {1, 2, 3, 5, 6} {1, 2, 3, 7, 8}
Using the explicit .union method is slightly more efficient here because it bypasses converting the arg to a set: internally, the method just iterates over the contents of the arg, adding them to the new set, so it doesn't care if the arg is a set, list, tuple, string, or dict.
From the official Python set docs
Note, the non-operator versions of union(), intersection(), difference(), and symmetric_difference(), issubset(), and issuperset() methods will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like set('abc') & 'cbs' in favor of the more readable set('abc').intersection('cbs').
The set add method mutates the set inplace and returns a None.
You should do:
unique_train_labels_threshold = unique_train_labels.copy()
unique_train_labels_threshold.add('no_region')
Using copy ensures mutations on the new set are not propagated to the old one.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With