I have an Python object called DNA. I want to create 100 instances of DNA. Each of the instances contains a pandas dataframe that is identical for all instances. To avoid duplication, I want to incorporate this dataframe as a static/class attribute.
import pandas as pd
some_df = pd.DataFrame()
class DNA(object):
df = some_variable # Do i declare here?
def __init__(self,df = pd.DataFrame(), name='1'):
self.name = name
self.instance_df = instance_df # I want to avoid this
DNA.some_df = df # Does this duplicate the data for every instance?
What is the correct way to do this?
I want to be able to change the dataframe that I use as a class variable but once the class is loaded, it needs to reference the same value (i.e. the same memory) in all instances.
I've answered your question in the comments:
import pandas as pd
some_df = pd.DataFrame()
class DNA(object):
df = some_variable # You assign here. I would use `some_df`
def __init__(self,df = pd.DataFrame(), name='1'):
self.name = name
self.instance_df = instance_df # Yes, avoid this
DNA.some_df = df # This does not duplicate, assignment **never copies in Python** However, I advise against this
So, using
DNA.some_df = df
inside __init__ does work. Since default arguments are evaluated only once at function definition time, that df is always the same df, unless you explicitly pass a new df to __init__, but that smacks of bad design to me. Rather, you probably want something like:
class DNA(object):
def __init__(self,df = pd.DataFrame(), name='1'):
self.name = name
<some work to construct a dataframe>
df = final_processing_function()
DNA.df = df
Suppose, then you want to change it, at any point you can use:
DNA.df = new_df
Note:
In [5]: class A:
...: pass
...:
In [6]: a1 = A()
In [7]: a2 = A()
In [8]: a3 = A()
In [9]: A.class_member = 42
In [10]: a1.class_member
Out[11]: 42
In [11]: a2.class_member
Out[11]: 42
In [12]: a3.class_member
Out[12]: 42
Be careful, though, when you assign to an instance Python takes you at your word:
In [14]: a2.class_member = 'foo' # this shadows the class variable with an instance variable in this instance...
In [15]: a1.class_member
Out[15]: 42
In [16]: a2.class_member # really an instance variable now!
Out[16]: 'foo'
And that is reflected by examining the namespace of the instances and the class object itself:
In [17]: a1.__dict__
Out[17]: {}
In [18]: a2.__dict__
Out[18]: {'class_member': 'foo'}
In [19]: A.__dict__
Out[19]:
mappingproxy({'__dict__': <attribute '__dict__' of 'A' objects>,
'__doc__': None,
'__module__': '__main__',
'__weakref__': <attribute '__weakref__' of 'A' objects>,
'class_member': 42})
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With