Maybe this is more of a theoretical language question rather than pandas per-se. I have a set of function extensions that I'd like to "attach" to e.g. a pandas DataFrame without explicitly calling utility functions and passing the DataFrame as an argument i.e. to have the syntactic sugar. Extending Pandas DataFrame is also not a choice because of the inaccessible types needed to define and chain the DataFrame contructor e.g. Axes
and Dtype
.
In Scala one can define an implicit class to attach functionality to an otherwise unavailable or too-complex-to-initialize object e.g. the String type can't be extended in Java AFAIR. For example the following attaches a function to a String type dynamically https://www.oreilly.com/library/view/scala-cookbook/9781449340292/ch01s11.html
scala> implicit class StringImprovements(s: String) {
def increment = s.map(c => (c + 1).toChar)
}
scala> val result = "HAL".increment
result: String = IBM
Likewise, I'd like to be able to do:
# somewhere in scope
def lexi_sort(df):
"""Lexicographically sorts the input pandas DataFrame by index and columns"""
df.sort_index(axis=0, level=df.index.names, inplace=True)
df.sort_index(axis=1, level=df.columns.names, inplace=True)
return df
df = pd.DataFrame(...)
# some magic and then ...
df.lexi_sort()
One valid possibility is to use the Decorator Pattern but I was wondering whether Python offered a less boiler-plate language alternative like Scala does.
In pandas, you can do:
def lexi_sort(df):
"""Lexicographically sorts the input pandas DataFrame by index and columns"""
df.sort_index(axis=0, level=df.index.names, inplace=True)
df.sort_index(axis=1, level=df.columns.names, inplace=True)
return df
pd.DataFrame.lexi_sort = lexi_sort
df = pd.read_csv('dummy.csv')
df.lexi_sort()
I guess for other objects you can define a method within the class to achieve the same outcome.
class A():
def __init__(self, df:pd.DataFrame):
self.df = df
self.n = 0
def lexi_sort(self):
"""Lexicographically sorts the input pandas DataFrame by index and columns"""
self.df.sort_index(axis=0, level=self.df.index.names, inplace=True)
self.df.sort_index(axis=1, level=self.df.columns.names, inplace=True)
return df
def add_one(self):
self.n += 1
a = A(df)
print(a.n)
a.add_one()
print(a.n)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With