Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python seems to treat instance variable as a class variable

Tags:

python

oop

I have an instance variable that seems to be treated like a class variable since it changes all instances of the object.

class DNA(object):

      def __init__(self,genes = pd.DataFrame(), sizecontrol=500, name='1'):
        self.name = name
        self.genes = genes  # This attribute should be an instance variable 
        self.GeneLen = self.genes.shape[1]
        self.sizecontrol = sizecontrol
        self.Features = []
        self.BaseFeats = []
        random.seed(self.name)

When I run this I get the following:

 In[68]: df = pd.DataFrame(data)

 In[69]: x1 = DNA(genes=df)

 In[70]: x2 = DNA(genes=df)

 In[71]: x1.genes["dummy"] = 'test'

 In[72]: x2.genes["dummy"].head(4) 
 Out[72]:   
  0 test 
  1 test 
  2 test 
  3 test 

How can I make sure x1.genes does not affect x2.genes?

like image 746
Arjan Groen Avatar asked Dec 19 '22 07:12

Arjan Groen


2 Answers

There are two issues here.

First, data frames are mutable objects and both of your instances are referencing the same object. You'll want to supply a new copy to each instance using df.copy(). You could alternatively copy the dataframe in the __init__ function itself. This would be "safer" in that can be sure that you are not reusing data frames, but this also might create unnecessary copies.

Second, and not relevant in your example, there is an issue with supplying a mutable default argument, genes = pd.DataFrame(). This data frame is saved on the unbound __init__ function like it was member data of that function (see __init__.__func__.func_defaults). Instead, use a default argument of None or some other sentinel value and then instantiate a new data frame when genes is None.

like image 84
Jared Goguen Avatar answered Dec 21 '22 10:12

Jared Goguen


Your code is working fine in the sense that genes is an attribute of instances of the DNA class.

However, you only ever created one dataframe. You assign the name df to it and also make it the attribute genes of both x1 and x2 with the

self.genes = genes

assignment. Since assignment never copies data you still have only one dataframe which is shared across x1 and x2.

enter image description here

To solve the issue, you could either make a copy of your dataframe before passing it to the DNA constructor or use

self.genes = genes.copy()

in the __init__ method.

like image 43
timgeb Avatar answered Dec 21 '22 09:12

timgeb