Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Downsampling the number of entries in a list (without interpolation)

I have a Python list with a number of entries, which I need to downsample using either:

  • A maximum number of rows. For example, limiting a list of 1234 entries to 1000.
  • A proportion of the original rows. For example, making the list 1/3 its original length.

(I need to be able to do both ways, but only one is used at a time).

I believe that for the maximum number of rows I can just calculate the proportion needed and pass that to the proportional downsizer:

def downsample_to_max(self, rows, max_rows):
        return downsample_to_proportion(rows, max_rows / float(len(rows)))

...so I really only need one downsampling function. Any hints, please?

EDIT: The list contains objects, not numeric values so I do not need to interpolate. Dropping objects is fine.

SOLUTION:

def downsample_to_proportion(self, rows, proportion):

    counter = 0.0
    last_counter = None
    results = []

    for row in rows:

        counter += proportion

        if int(counter) != last_counter:
            results.append(row)
            last_counter = int(counter)

    return results

Thanks.

like image 545
Dave Avatar asked Jun 10 '10 08:06

Dave


2 Answers

You can use islice from itertools:

from itertools import islice

def downsample_to_proportion(rows, proportion=1):
    return list(islice(rows, 0, len(rows), int(1/proportion)))

Usage:

x = range(1,10)
print downsample_to_proportion(x, 0.3)
# [1, 4, 7]
like image 132
tzaman Avatar answered Nov 20 '22 13:11

tzaman


Instead of islice() + list() it is more efficient to use slice syntax directly if the input is already a sequence type:

def downsample_to_proportion(rows, proportion):
    return rows[::int(1 / proportion)]
like image 3
BlackJack Avatar answered Nov 20 '22 14:11

BlackJack