I have a list as follows:
[(220921998, 2426),
(220921999, 2427),
(220922000, 2428),
(220922001, 2429),
(220922563, 2991),
(220922564, 2992),
(220922565, 2993),
(220922566, 2994),
(220922575, 3003),
(220923958, 4386),
(220924161, 4589),
(220924170, 4598),
(220924171, 4599),
(220924172, 4600),
(220924173, 4601),
(220924912, 5340),
(220926340, 6768),
(220926341, 6769),
(220926342, 6770),
(220926343, 6771),
(220926344, 6772),
(220927052, 7480),
(220927053, 7481),
(220927054, 7482),
(220927055, 7483),
(220927056, 7484),
(220927069, 7497),
(220927071, 7499)]
I want to add a string to the list based on the second number. If second number in the list are within 20 or so of other second numbers, they will be assigned same 'project' name. See below:
[(220921998, 2426,project1),
(220921999, 2427,project1),
(220922000, 2428,project1),
(220922001, 2429,project1),
(220922563, 2991,project2),
(220922564, 2992,project2),
(220922565, 2993,project2),
(220922566, 2994,project2),
(220922575, 3003,project3),
(220923958, 4386,project4),
(220924161, 4589,project5),
(220924170, 4598,project5),
(220924171, 4599,project5),
(220924172, 4600,project5),
(220924173, 4601,project5),
(220924912, 5340,project6),
(220926340, 6768,project7),
(220926341, 6769,project7),
(220926342, 6770,project7),
(220926343, 6771,project7),
(220926344, 6772,project7),
(220927052, 7480,project8),
(220927053, 7481,project8),
(220927054, 7482,project8),
(220927055, 7483,project8),
(220927056, 7484,project8),
(220927069, 7497,project8),
(220927071, 7499,project8)]
I have tried groupby
, but couldn't find a way to work it for range.Any help would be great. Thank you
Use a list comprehension to group a list by values. Use the list comprehension syntax [list[1] for list in list_of_lists] to get a list containing only the second element from each list in list_of_lists . Call set(list) with list as the previous result to remove any duplicate elements from list .
x, range actually creates a list (which is also a sequence) whereas xrange creates an xrange object that can be used to iterate through the values.
Python range is inclusive because it starts with the first argument of the range() method, but it does not end with the second argument of the range() method; it ends with the end – 1 index.
Use itertools.groupby
with a key function that remember the last item and check it with current item.
lst = [(220921998, 2426),
(220921999, 2427),
(220922000, 2428),
(220922001, 2429),
(220922563, 2991),
(220922564, 2992),
(220922565, 2993),
(220922566, 2994),
(220922575, 3003),
(220923958, 4386),
(220924161, 4589),
....]
class Delta:
def __init__(self, delta):
self.last = None
self.delta = delta
self.key = 1
def __call__(self, value):
if self.last is not None and abs(self.last - value[1]) > self.delta:
# Compare with the last value (`self.last`)
# If difference is larger than 20, advance to next project
self.key += 1
self.last = value[1] # Remeber the last value.
return self.key
import itertools
for key, grp in itertools.groupby(lst, key=Delta(20)):
for tup in grp:
print(tup + ('project{}'.format(key),))
If you use Python 3.x, you can use the following function instead (See nonlocal
):
def Delta(delta):
last = None
key = 1
def keyfunc(value):
nonlocal last, key
if last is not None and abs(last - value[1]) > delta:
key += 1
last = value[1]
return key
return keyfunc
using https://pypi.python.org/pypi/cluster/1.1.0b1
>>> import cluster
>>> cl = cluster.HierarchicalClustering(data, lambda x,y: abs(x[1]-y[1]))
>>> cl.getlevel(20)
[
[(220926340, 6768), (220926341, 6769), (220926344, 6772), (220926342, 6770),
(220926343, 6771)],
[(220927052, 7480), (220927053, 7481), (220927056, 7484),
(220927054, 7482), (220927055, 7483), (220927069, 7497), (220927071, 7499)],
[(220921998, 2426), (220921999, 2427), (220922000, 2428), (220922001, 2429)],
[(220922575, 3003), (220922563, 2991), (220922564, 2992), (220922565, 2993),
(220922566, 2994)],
[(220924912, 5340)],
[(220923958, 4386)],
[(220924161, 4589), (220924170, 4598), (220924171, 4599), (220924172, 4600),
(220924173, 4601)]
]
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With