Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data structure for custom indexing

I am looking to write a data structure to represent some genetic data. This data can be represented as a list of size n, where each entry also has a "genetic position" which is a real number between 0 and 1. To make nomenclature clear, I'll call the position in the list id and the genetic position gpos. The way I implemented this is as a class with

class Coords(object):

    def __init__(self, *args, **kwargs):
        self.f = list(*args, **kwargs)
        self.r = dict()
        for i,e in enumerate(self.f):
            self.r[e] = i

    def __setitem__(self,x,y):
        self.f.__setitem__(x,y)
        self.r.__setitem__(y,x)

    def __getitem__(self,x):
        return self.f.__getitem__(x)

    def __len__(self):
        return self.f.__len__()

now, I have two issues with this. The first one is that the indeces of self.r are floats, which is obviously a bad idea. I was thinking of converting them to strings (with a fixed number of digits), but is there a better idea? The other issue I have is that I want to implement accessing entries via gpos, so if I, for example, would like to access everything between gpos 0.2 and 0.4, I would like to be able to do that using

import numpy as np
Coords(np.arange(1,0,-.1))
c.r[0.2:0.4]

is there an easy way to define that? I was thinking of finding the correct id of the starting and ending positions using binary search and then access self.f using these ids, but is there a way to implement above syntax?

like image 489
bpeter Avatar asked Feb 16 '23 10:02

bpeter


1 Answers

When you index an object with a slice, Python creates a slice object with the inputs you provide. For example, if you do c[0.2:0.4], then the argument passed to c.__getitem__ will be slice(0.2, 0.4). So you could have something like this code in your __getitem__ method:

def __getitem__(self, x):
    if isinstance(x, slice):
        start = x.start
        stop = x.stop
        step = x.step
        # Do whatever you want to do to define your return
    ...

If you want to use this fancy indexing not on the Coords object, but in the self.r dictionary, I think the easiest would be to create a FancyIndexDict that is a subclass of dict, modify its __getitem__ method, and then have self.r be a FancyIndexDict, not a dict.

like image 145
Jaime Avatar answered Feb 18 '23 00:02

Jaime