Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Duplicates when appending string to list from dataframe with common column value

Tags:

python

pandas

Beginner here, I am trying to isolate the names of neighborhoods from a dataframe of Toronto based on a cluster value I've assigned them. Instead of a list of 3 unique items, I end up with a list 2363 items long.

Neigh_List = []
for n in toronto_merged['Cluster Labels']:

        if n == 7 :
        x = toronto_merged['Neighborhood']
        Neigh_List.append(x) if x not in Neigh_List else None      


        
               
Neigh_List

[0                                                                                                Parkwoods
 1                                                                                                Parkwoods
 2                                                                                         Victoria Village
 3                                                                                         Victoria Village
 4                                                                                         Victoria Village
                                                        ...                                                
 2359    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2360    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2361    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2362    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 2363    Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
 Name: Neighborhood, Length: 2364, dtype: object]
like image 807
Drakosfire Avatar asked Dec 19 '25 10:12

Drakosfire


1 Answers

In general, looping over Pandas dataframes should be avoided for larger datasets (~1000+) as Pandas built-in vectorized functions are often faster (See this other stackoverflow post).

You could try something like:

neigh_list = list(toronto_merged.loc[toronto_merged['Neighborhood'] == 7]]['Neighborhood'].unique())

Additionally, if you want to avoid duplicates in a list, you could use python sets (see 5.4 at the time of writing).

unique_elements = set()
for x in some_iterable:
    unique_elements.add(x)

Or, using a set comprehension:

unique_elements = {unique_item for unique_item in some_iterable}
like image 62
jrbergen Avatar answered Dec 20 '25 23:12

jrbergen