Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Group values based on range of number in python

Tags:

python

I have a list as follows:

[(220921998, 2426),
(220921999, 2427),
(220922000, 2428),
(220922001, 2429),
(220922563, 2991),
(220922564, 2992),
(220922565, 2993),
(220922566, 2994),
(220922575, 3003),
(220923958, 4386),
(220924161, 4589),
(220924170, 4598),
(220924171, 4599),
(220924172, 4600),
(220924173, 4601),
(220924912, 5340),
(220926340, 6768),
(220926341, 6769),
(220926342, 6770),
(220926343, 6771),
(220926344, 6772),
(220927052, 7480),
(220927053, 7481),
(220927054, 7482),
(220927055, 7483),
(220927056, 7484),
(220927069, 7497),
(220927071, 7499)]

I want to add a string to the list based on the second number. If second number in the list are within 20 or so of other second numbers, they will be assigned same 'project' name. See below:

[(220921998, 2426,project1),
(220921999, 2427,project1),
(220922000, 2428,project1),
(220922001, 2429,project1),
(220922563, 2991,project2),
(220922564, 2992,project2),
(220922565, 2993,project2),
(220922566, 2994,project2),
(220922575, 3003,project3),
(220923958, 4386,project4),
(220924161, 4589,project5),
(220924170, 4598,project5),
(220924171, 4599,project5),
(220924172, 4600,project5),
(220924173, 4601,project5),
(220924912, 5340,project6),
(220926340, 6768,project7),
(220926341, 6769,project7),
(220926342, 6770,project7),
(220926343, 6771,project7),
(220926344, 6772,project7),
(220927052, 7480,project8),
(220927053, 7481,project8),
(220927054, 7482,project8),
(220927055, 7483,project8),
(220927056, 7484,project8),
(220927069, 7497,project8),
(220927071, 7499,project8)]

I have tried groupby, but couldn't find a way to work it for range.Any help would be great. Thank you

like image 624
msakya Avatar asked Dec 07 '13 03:12

msakya


People also ask

How do you group values in a list in Python?

Use a list comprehension to group a list by values. Use the list comprehension syntax [list[1] for list in list_of_lists] to get a list containing only the second element from each list in list_of_lists . Call set(list) with list as the previous result to remove any duplicate elements from list .

Does Range generate a list?

x, range actually creates a list (which is also a sequence) whereas xrange creates an xrange object that can be used to iterate through the values.

Are ranges inclusive in Python?

Python range is inclusive because it starts with the first argument of the range() method, but it does not end with the second argument of the range() method; it ends with the end – 1 index.


2 Answers

Use itertools.groupby with a key function that remember the last item and check it with current item.

lst = [(220921998, 2426),
       (220921999, 2427),
       (220922000, 2428),
       (220922001, 2429),
       (220922563, 2991),
       (220922564, 2992),
       (220922565, 2993),
       (220922566, 2994),
       (220922575, 3003),
       (220923958, 4386),
       (220924161, 4589),
       ....]

class Delta:
    def __init__(self, delta):
        self.last = None
        self.delta = delta
        self.key = 1
    def __call__(self, value):
        if self.last is not None and abs(self.last - value[1]) > self.delta:
            # Compare with the last value (`self.last`)
            # If difference is larger than 20, advance to next project
            self.key += 1
        self.last = value[1]  # Remeber the last value.
        return self.key

import itertools
for key, grp in itertools.groupby(lst, key=Delta(20)):
    for tup in grp:
        print(tup + ('project{}'.format(key),))

If you use Python 3.x, you can use the following function instead (See nonlocal):

def Delta(delta):
    last = None
    key = 1
    def keyfunc(value):
        nonlocal last, key
        if last is not None and abs(last - value[1]) > delta:
            key += 1
        last = value[1]
        return key
    return keyfunc
like image 57
falsetru Avatar answered Oct 16 '22 15:10

falsetru


using https://pypi.python.org/pypi/cluster/1.1.0b1

>>> import cluster
>>> cl = cluster.HierarchicalClustering(data, lambda x,y: abs(x[1]-y[1]))
>>> cl.getlevel(20)
[
 [(220926340, 6768), (220926341, 6769), (220926344, 6772), (220926342, 6770), 
  (220926343, 6771)], 

 [(220927052, 7480), (220927053, 7481), (220927056, 7484), 
  (220927054, 7482), (220927055, 7483), (220927069, 7497), (220927071, 7499)], 

 [(220921998, 2426), (220921999, 2427), (220922000, 2428), (220922001, 2429)], 

 [(220922575, 3003), (220922563, 2991), (220922564, 2992), (220922565, 2993), 
  (220922566, 2994)], 

 [(220924912, 5340)], 

 [(220923958, 4386)], 

 [(220924161, 4589), (220924170, 4598), (220924171, 4599), (220924172, 4600), 
  (220924173, 4601)]
]
like image 23
Guy Gavriely Avatar answered Oct 16 '22 15:10

Guy Gavriely