Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pythonic format for indices

I am after a string format to efficiently represent a set of indices. For example "1-3,6,8-10,16" would produce [1,2,3,6,8,9,10,16]

Ideally I would also be able to represent infinite sequences.

Is there an existing standard way of doing this? Or a good library? Or can you propose your own format?

thanks!

Edit: Wow! - thanks for all the well considered responses. I agree I should use ':' instead. Any ideas about infinite lists? I was thinking of using "1.." to represent all positive numbers.

The use case is for a shopping cart. For some products I need to restrict product sales to multiples of X, for others any positive number. So I am after a string format to represent this in the database.

like image 413
hoju Avatar asked Sep 26 '09 13:09

hoju


3 Answers

You don't need a string for that, This is as simple as it can get:

from types import SliceType

class sequence(object):
    def __getitem__(self, item):
        for a in item:
            if isinstance(a, SliceType):
                i = a.start
                step = a.step if a.step else 1
                while True:
                    if a.stop and i > a.stop:
                        break
                    yield i
                    i += step
            else:
                yield a

print list(sequence()[1:3,6,8:10,16])

Output:

[1, 2, 3, 6, 8, 9, 10, 16]

I'm using Python slice type power to express the sequence ranges. I'm also using generators to be memory efficient.

Please note that I'm adding 1 to the slice stop, otherwise the ranges will be different because the stop in slices is not included.

It supports steps:

>>> list(sequence()[1:3,6,8:20:2])
[1, 2, 3, 6, 8, 10, 12, 14, 16, 18, 20]

And infinite sequences:

sequence()[1:3,6,8:]
1, 2, 3, 6, 8, 9, 10, ...

If you have to give it a string then you can combine @ilya n. parser with this solution. I'll extend @ilya n. parser to support indexes as well as ranges:

def parser(input):
    ranges = [a.split('-') for a in input.split(',')]
    return [slice(*map(int, a)) if len(a) > 1 else int(a[0]) for a in ranges]

Now you can use it like this:

>>> print list(sequence()[parser('1-3,6,8-10,16')])
[1, 2, 3, 6, 8, 9, 10, 16]
like image 65
Nadia Alramli Avatar answered Sep 30 '22 13:09

Nadia Alramli


If you're into something Pythonic, I think 1:3,6,8:10,16 would be a better choice, as x:y is a standard notation for index range and the syntax allows you to use this notation on objects. Note that the call

z[1:3,6,8:10,16]

gets translated into

z.__getitem__((slice(1, 3, None), 6, slice(8, 10, None), 16))

Even though this is a TypeError if z is a built-in container, you're free to create the class that will return something reasonable, e.g. as NumPy's arrays.

You might also say that by convention 5: and :5 represent infinite index ranges (this is a bit stretched as Python has no built-in types with negative or infinitely large positive indexes).

And here's the parser (a beautiful one-liner that suffers from slice(16, None, None) glitch described below):

def parse(s):
    return [slice(*map(int, x.split(':'))) for x in s.split(',')]

There's one pitfall, however: 8:10 by definition includes only indices 8 and 9 -- without upper bound. If that's unacceptable for your purposes, you certainly need a different format and 1-3,6,8-10,16 looks good to me. The parser then would be

def myslice(start, stop=None, step=None):
    return slice(start, (stop if stop is not None else start) + 1, step)

def parse(s):
    return [myslice(*map(int, x.split('-'))) for x in s.split(',')]

Update: here's the full parser for a combined format:

from sys import maxsize as INF

def indices(s: 'string with indices list') -> 'indices generator':
    for x in s.split(','):
        splitter = ':' if (':' in x) or (x[0] == '-') else '-'
        ix = x.split(splitter)
        start = int(ix[0]) if ix[0] is not '' else -INF
        if len(ix) == 1:
            stop = start + 1
        else:
            stop = int(ix[1]) if ix[1] is not '' else INF
        step = int(ix[2]) if len(ix) > 2 else 1
        for y in range(start, stop + (splitter == '-'), step):
            yield y

This handles negative numbers as well, so

 print(list(indices('-5, 1:3, 6, 8:15:2, 20-25, 18')))

prints

[-5, 1, 2, 6, 7, 8, 10, 12, 14, 20, 21, 22, 23, 24, 25, 18, 19]

Yet another alternative is to use ... (which Python recognizes as the built-in constant Ellipsis so you can call z[...] if you want) but I think 1,...,3,6, 8,...,10,16 is less readable.

like image 29
ilya n. Avatar answered Sep 30 '22 15:09

ilya n.


This is probably about as lazily as it can be done, meaning it will be okay for even very large lists:

def makerange(s):
    for nums in s.split(","): # whole list comma-delimited
        range_ = nums.split("-") # number might have a dash - if not, no big deal
        start = int(range_[0])
        for i in xrange(start, start + 1 if len(range_) == 1 else int(range_[1]) + 1):
            yield i

s = "1-3,6,8-10,16"
print list(makerange(s))

output:

[1, 2, 3, 6, 8, 9, 10, 16]
like image 20
Mark Rushakoff Avatar answered Sep 30 '22 15:09

Mark Rushakoff