Beginner here, I am trying to isolate the names of neighborhoods from a dataframe of Toronto based on a cluster value I've assigned them. Instead of a list of 3 unique items, I end up with a list 2363 items long.
Neigh_List = []
for n in toronto_merged['Cluster Labels']:
if n == 7 :
x = toronto_merged['Neighborhood']
Neigh_List.append(x) if x not in Neigh_List else None
Neigh_List
[0 Parkwoods
1 Parkwoods
2 Victoria Village
3 Victoria Village
4 Victoria Village
...
2359 Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2360 Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2361 Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2362 Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
2363 Mimico NW , The Queensway West , South of Bloor , Kingsway Park South West , Royal York South West
Name: Neighborhood, Length: 2364, dtype: object]
In general, looping over Pandas dataframes should be avoided for larger datasets (~1000+) as Pandas built-in vectorized functions are often faster (See this other stackoverflow post).
You could try something like:
neigh_list = list(toronto_merged.loc[toronto_merged['Neighborhood'] == 7]]['Neighborhood'].unique())
Additionally, if you want to avoid duplicates in a list, you could use python sets (see 5.4 at the time of writing).
unique_elements = set()
for x in some_iterable:
unique_elements.add(x)
Or, using a set comprehension:
unique_elements = {unique_item for unique_item in some_iterable}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With