Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Assign a pandas dataframe to an object as a static class variable - memory use (Python)

Tags:

python-3.x

I have an Python object called DNA. I want to create 100 instances of DNA. Each of the instances contains a pandas dataframe that is identical for all instances. To avoid duplication, I want to incorporate this dataframe as a static/class attribute.

import pandas as pd
some_df = pd.DataFrame()

class DNA(object):
  df = some_variable  # Do i declare here?

  def __init__(self,df = pd.DataFrame(), name='1'):
    self.name = name
    self.instance_df = instance_df  # I want to avoid this
    DNA.some_df = df  # Does this duplicate the data for every instance?

What is the correct way to do this?

  • Can I use the init function to create the class variable? Or will it create a separate class variable for every instance of the class?
  • Do I need to declare the class variable between the 'class..' and 'def init(...)'?
  • Some other way?

I want to be able to change the dataframe that I use as a class variable but once the class is loaded, it needs to reference the same value (i.e. the same memory) in all instances.

like image 359
Arjan Groen Avatar asked Mar 01 '26 09:03

Arjan Groen


1 Answers

I've answered your question in the comments:

import pandas as pd
some_df = pd.DataFrame()

class DNA(object):
  df = some_variable  # You assign here. I would use `some_df`

  def __init__(self,df = pd.DataFrame(), name='1'):
    self.name = name
    self.instance_df = instance_df  # Yes, avoid this
    DNA.some_df = df  # This does not duplicate, assignment **never copies in Python** However, I advise against this

So, using

DNA.some_df = df

inside __init__ does work. Since default arguments are evaluated only once at function definition time, that df is always the same df, unless you explicitly pass a new df to __init__, but that smacks of bad design to me. Rather, you probably want something like:

class DNA(object):

  def __init__(self,df = pd.DataFrame(), name='1'):
    self.name = name

<some work to construct a dataframe>
df = final_processing_function()

DNA.df = df

Suppose, then you want to change it, at any point you can use:

DNA.df = new_df

Note:

In [5]: class A:
   ...:     pass
   ...:

In [6]: a1 = A()

In [7]: a2 = A()

In [8]: a3 = A()

In [9]: A.class_member = 42

In [10]: a1.class_member
Out[11]: 42

In [11]: a2.class_member
Out[11]: 42

In [12]: a3.class_member
Out[12]: 42

Be careful, though, when you assign to an instance Python takes you at your word:

In [14]: a2.class_member = 'foo' # this shadows the class variable with an instance variable in this instance...

In [15]: a1.class_member
Out[15]: 42

In [16]: a2.class_member # really an instance variable now!
Out[16]: 'foo'

And that is reflected by examining the namespace of the instances and the class object itself:

In [17]: a1.__dict__
Out[17]: {}

In [18]: a2.__dict__
Out[18]: {'class_member': 'foo'}

In [19]: A.__dict__
Out[19]:
mappingproxy({'__dict__': <attribute '__dict__' of 'A' objects>,
              '__doc__': None,
              '__module__': '__main__',
              '__weakref__': <attribute '__weakref__' of 'A' objects>,
              'class_member': 42})
like image 167
juanpa.arrivillaga Avatar answered Mar 03 '26 04:03

juanpa.arrivillaga



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!