Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python: Enumerate a list of string 'keys' into ints

I searched for a while but didn't find anything that explained exactly what I'm trying to do.

Basically I have a list of string "labels", e.g. ["brown", "black", "blue", "brown", "brown", "black"] etc. What I want to do is convert this into a list of integers where each label corresponds to an integer, so

["brown", "black", "blue", "brown", "brown", "black"]

becomes

[1, 2, 3, 1, 1, 2]

I looked into the enumerate function but when I gave it my list of strings (which is quite long), it assigned an int to each individual label, instead of giving the same label the same int:

[(1,"brown"),(2,"black"),(3,"blue"),(4,"brown"),(5,"brown"),(6,"black")]

I know how I could do this with a long and cumbersome for loop and if-else checks, but really I'm curious if there's a more elegant way to do this in only one or two lines.

like image 850
gpanders Avatar asked Dec 20 '22 04:12

gpanders


1 Answers

You have non-unique labels; you can use a defaultdict to generate numbers on first access, combined with a counter:

from collections import defaultdict
from itertools import count
from functools import partial

label_to_number = defaultdict(partial(next, count(1)))
[(label_to_number[label], label) for label in labels]

This generates a count in order of the labels first occurrence in labels.

Demo:

>>> labels = ["brown", "black", "blue", "brown", "brown", "black"]
>>> label_to_number = defaultdict(partial(next, count(1)))
>>> [(label_to_number[label], label) for label in labels]
[(1, 'brown'), (2, 'black'), (3, 'blue'), (1, 'brown'), (1, 'brown'), (2, 'black')]

Because we are using a dictionary, the label-to-number lookups are constant cost, so the whole operation will take linear time based on the length of the labels list.

Alternatively, use a set() to get unique values, then map these to a enumerate() count:

label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
[(label_to_number[label], label) for label in labels]

This assigns numbers more arbitrarily, as set() objects are not ordered:

>>> label_to_number = {label: i for i, label in enumerate(set(labels), 1)}
>>> [(label_to_number[label], label) for label in labels]
[(2, 'brown'), (3, 'black'), (1, 'blue'), (2, 'brown'), (2, 'brown'), (3, 'black')]

This requires looping through labels twice though.

Neither approach requires you to first define a dictionary of labels; the mapping is created automatically.

like image 172
Martijn Pieters Avatar answered Jan 08 '23 19:01

Martijn Pieters