I have a list of strings. I want to assign a unique number to each string (the exact number is not important), and create a list of the same length using these numbers, in order. Below is my best attempt at it, but I am not happy for two reasons: <ol> <li>It assumes that the same values are next to each other </li> <li>I had to start the list with a <code>0</code>, otherwise the output would be incorrect</li> </ol> My code: <pre class="prettyprint"><code>names = ['ll', 'll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'HL', 'HL'] numbers = [0] num = 0 for item in range(len(names)): if item == len(names) - 1: break elif names[item] == names[item+1]: numbers.append(num) else: num = num + 1 numbers.append(num) print(numbers) </code></pre> I want to make the code more generic, so it will work with an unknown list. Any ideas?

Without using an external library (check the EDIT for a <code>Pandas</code> solution) you can do it as follows : <pre class="prettyprint"><code>d = {ni: indi for indi, ni in enumerate(set(names))} numbers = [d[ni] for ni in names] </code></pre> Brief explanation: In the first line, you assign a number to each unique element in your list (stored in the dictionary <code>d</code>; you can easily create it using a dictionary comprehension; <code>set</code> returns the unique elements of <code>names</code>). Then, in the second line, you do a list comprehension and store the actual numbers in the list <code>numbers</code>. One example to illustrate that it also works fine for unsorted lists: <pre class="prettyprint"><code># 'll' appears all over the place names = ['ll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'll', 'LL', 'HL', 'HL', 'HL', 'll'] </code></pre> That is the output for <code>numbers</code>: <pre class="prettyprint"><code>[1, 1, 3, 3, 3, 2, 2, 1, 2, 0, 0, 0, 1] </code></pre> As you can see, the number <code>1</code> associated with <code>ll</code> appears at the correct places. EDIT If you have Pandas available, you can also use <code>pandas.factorize</code> (which seems to be quite efficient for huge lists and also works fine for lists of tuples as explained here): <pre class="prettyprint"><code>import pandas as pd pd.factorize(names) </code></pre> will then return <pre class="prettyprint"><code>(array([(array([0, 0, 1, 1, 1, 2, 2, 0, 2, 3, 3, 3, 0]), array(['ll', 'hl', 'LL', 'HL'], dtype=object)) </code></pre> Therefore, <pre class="prettyprint"><code>numbers = pd.factorize(names)[0] </code></pre>

Assign a number to each unique value in a list

Tags:

python

list

I have a list of strings. I want to assign a unique number to each string (the exact number is not important), and create a list of the same length using these numbers, in order. Below is my best attempt at it, but I am not happy for two reasons:

It assumes that the same values are next to each other
I had to start the list with a 0, otherwise the output would be incorrect

My code:

names = ['ll', 'll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'LL', 'HL', 'HL', 'HL']
numbers = [0]
num = 0
for item in range(len(names)):
    if item == len(names) - 1:
      break
    elif names[item] == names[item+1]:
        numbers.append(num)
    else:
        num = num + 1
        numbers.append(num)
print(numbers)

I want to make the code more generic, so it will work with an unknown list. Any ideas?

836

asked Feb 20 '17 16:02

millsy

Video Answer

1 Answers

Without using an external library (check the EDIT for a Pandas solution) you can do it as follows :

d = {ni: indi for indi, ni in enumerate(set(names))}
numbers = [d[ni] for ni in names]

Brief explanation:

In the first line, you assign a number to each unique element in your list (stored in the dictionary d; you can easily create it using a dictionary comprehension; set returns the unique elements of names).

Then, in the second line, you do a list comprehension and store the actual numbers in the list numbers.

One example to illustrate that it also works fine for unsorted lists:

# 'll' appears all over the place
names = ['ll', 'll', 'hl', 'hl', 'hl', 'LL', 'LL', 'll', 'LL', 'HL', 'HL', 'HL', 'll']

That is the output for numbers:

[1, 1, 3, 3, 3, 2, 2, 1, 2, 0, 0, 0, 1]

As you can see, the number 1 associated with ll appears at the correct places.

EDIT

If you have Pandas available, you can also use pandas.factorize (which seems to be quite efficient for huge lists and also works fine for lists of tuples as explained here):

import pandas as pd

pd.factorize(names)

will then return

(array([(array([0, 0, 1, 1, 1, 2, 2, 0, 2, 3, 3, 3, 0]),
 array(['ll', 'hl', 'LL', 'HL'], dtype=object))

Therefore,

numbers = pd.factorize(names)[0]

171

answered Sep 17 '22 13:09

Cleb

Related questions
                            
                                Add item to pandas.Series?
                            
                                Python multiprocessing example not working
                            
                                Running code in PyCharm's console
                            
                                Zbar + python, crashes on import (OSX 10.9.1)
                            
                                Iterate over a dictionary by comprehension and get a dictionary [duplicate]
                            
                                Plotting time-series data with seaborn
                            
                                What is more efficient .objects.filter().exists() or get() wrapped on a try
                            
                                Recursive feature elimination on Random Forest using scikit-learn
                            
                                traceback from a warning
                            
                                Operator NOT IN with Peewee
                            
                                'str' object has no attribute 'decode' in Python3
                            
                                base64.encodestring failing in python 3
                            
                                Using str.contains on pandas dataframe [duplicate]
                            
                                How to I hide my secret_key using virtualenv and Django?
                            
                                Django models: add index on date, desc order
                            
                                Error running Django in Intellij / Pycharm
                            
                                quantile normalization on pandas dataframe
                            
                                Sometimes request.session.session_key is None
                            
                                Inheriting Meta Class in Django Models
                            
                                How do I iterate through combinations of a list [duplicate]

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With