I have a list of strings. I want to assign a unique number to each string (the exact number is not important), and create a list of the same length using these numbers, in order. Below is my best attempt at it, but I am not happy for two reasons:
It assumes that the same values are next to each other
I had to start the list with a 0
, otherwise the output would be incorrect
My code:
names = ['ll', 'll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'HL', 'HL']
numbers = [0]
num = 0
for item in range(len(names)):
if item == len(names) - 1:
break
elif names[item] == names[item+1]:
numbers.append(num)
else:
num = num + 1
numbers.append(num)
print(numbers)
I want to make the code more generic, so it will work with an unknown list. Any ideas?
With enumerate and set The enumerate function assigns unique ids to each element. But if the list already as duplicate elements then we need to create a dictionary of key value pairs form the list and assign unique values using the set function.
insert(index, elem) -- inserts the element at the given index, shifting elements to the right. list. extend(list2) adds the elements in list2 to the end of the list. Using + or += on a list is similar to using extend().
To convert an integer to string in Python, use the str() function. This function takes any data type and converts it into a string, including integers. Use the syntax print(str(INT)) to return the int as a str , or string.
Without using an external library (check the EDIT for a Pandas
solution) you can do it as follows :
d = {ni: indi for indi, ni in enumerate(set(names))}
numbers = [d[ni] for ni in names]
Brief explanation:
In the first line, you assign a number to each unique element in your list (stored in the dictionary d
; you can easily create it using a dictionary comprehension; set
returns the unique elements of names
).
Then, in the second line, you do a list comprehension and store the actual numbers in the list numbers
.
One example to illustrate that it also works fine for unsorted lists:
# 'll' appears all over the place
names = ['ll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'll', 'LL', 'HL', 'HL', 'HL', 'll']
That is the output for numbers
:
[1, 1, 3, 3, 3, 2, 2, 1, 2, 0, 0, 0, 1]
As you can see, the number 1
associated with ll
appears at the correct places.
EDIT
If you have Pandas available, you can also use pandas.factorize
(which seems to be quite efficient for huge lists and also works fine for lists of tuples as explained here):
import pandas as pd
pd.factorize(names)
will then return
(array([(array([0, 0, 1, 1, 1, 2, 2, 0, 2, 3, 3, 3, 0]),
array(['ll', 'hl', 'LL', 'HL'], dtype=object))
Therefore,
numbers = pd.factorize(names)[0]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With