I searched for a while but didn't find anything that explained exactly what I'm trying to do.
Basically I have a list of string "labels", e.g. ["brown", "black", "blue", "brown", "brown", "black"] etc. What I want to do is convert this into a list of integers where each label corresponds to an integer, so
["brown", "black", "blue", "brown", "brown", "black"]
becomes
[1, 2, 3, 1, 1, 2]
I looked into the enumerate function but when I gave it my list of strings (which is quite long), it assigned an int to each individual label, instead of giving the same label the same int:
[(1,"brown"),(2,"black"),(3,"blue"),(4,"brown"),(5,"brown"),(6,"black")]
I know how I could do this with a long and cumbersome for loop and if-else checks, but really I'm curious if there's a more elegant way to do this in only one or two lines.
You have non-unique labels; you can use a defaultdict
to generate numbers on first access, combined with a counter:
from collections import defaultdict
from itertools import count
from functools import partial
label_to_number = defaultdict(partial(next, count(1)))
[(label_to_number[label], label) for label in labels]
This generates a count in order of the labels first occurrence in labels
.
Demo:
>>> labels = ["brown", "black", "blue", "brown", "brown", "black"]
>>> label_to_number = defaultdict(partial(next, count(1)))
>>> [(label_to_number[label], label) for label in labels]
[(1, 'brown'), (2, 'black'), (3, 'blue'), (1, 'brown'), (1, 'brown'), (2, 'black')]
Because we are using a dictionary, the label-to-number lookups are constant cost, so the whole operation will take linear time based on the length of the labels
list.
Alternatively, use a set()
to get unique values, then map these to a enumerate()
count:
label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
[(label_to_number[label], label) for label in labels]
This assigns numbers more arbitrarily, as set()
objects are not ordered:
>>> label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
>>> [(label_to_number[label], label) for label in labels]
[(2, 'brown'), (3, 'black'), (1, 'blue'), (2, 'brown'), (2, 'brown'), (3, 'black')]
This requires looping through labels
twice though.
Neither approach requires you to first define a dictionary of labels; the mapping is created automatically.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With