Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

subclasses of pandas' object work differently from subclass of other object?

I am trying to create a subclass of a Pandas data structure to substitute, in my code, a subclass of a dict with a subclass of a Series, I don't understand why this example code doesn't work

from pandas import Series    

class Support(Series):
    def supportMethod1(self):
        print 'I am support method 1'       
    def supportMethod2(self):
        print 'I am support method 2'

class Compute(object):
    supp=None        
    def test(self):
        self.supp()  

class Config(object):
    supp=None        
    @classmethod
    def initializeConfig(cls):
        cls.supp=Support()
    @classmethod
    def setConfig1(cls):
        Compute.supp=cls.supp.supportMethod1
    @classmethod
    def setConfig2(cls):
        Compute.supp=cls.supp.supportMethod2            

Config.initializeConfig()

Config.setConfig1()    
c1=Compute()
c1.test()

Config.setConfig2()    
c1.test()

Probably it is not the best method to change the configuration of some objects, anyway I found this usefull in my code and most of all I want to understand why with dict instead of series it works as I expect.

Thanks a lot!

like image 270
Francesco Avatar asked Aug 16 '12 00:08

Francesco


1 Answers

Current Answer (Pandas >= 0.13)

An internal refactor in Pandas 0.13 drastically simplified subclassing. Pandas Series can now be subclassed like any other Python object:

class MySeries(pd.Series):
    def my_method(self):
        return "my_method"

Legacy Answer (Pandas <= 0.12)

The problem is that Series uses __new__ which is ensuring that a Series object is instantiated.

You can modify your class like so:

class Support(pd.Series):
    def __new__(cls, *args, **kwargs):
        arr = Series.__new__(cls, *args, **kwargs)
        return arr.view(Support)

    def supportMethod1(self):
        print 'I am support method 1'       
    def supportMethod2(self):
        print 'I am support method 2'

However, it's probably best to do a has-a instead of a is-a. Or monkey patch the Series object. The reason is that you will often lose your subclass while using pandas due to the nature of it's data storage. Something as simple as

s.ix[:5] 
s.cumsum()

Will return a Series object instead of your subclass. Internally, the data is stored in contiguous arrays and optimized for speed. The data is only boxed with a class when needed and those classes are hardcoded. Plus, it's not immediately obvious if something like s.ix[:5] should return the same subclass. That would depend on the semantics of your subclass and what metadata is attached to it.

http://nbviewer.ipython.org/3366583/subclassing%20pandas%20objects.ipynb has some notes.

like image 92
Dale Jung Avatar answered Oct 04 '22 08:10

Dale Jung