I have a DataFrame looks like below:
df = {'col_1': [1,2,3,4,5,6,7,8,9,10],
'col_2': [1,2,3,4,5,6,7,8,9,10],
'col_3':['A','A','A','A','A','B','B','B','B','B']}
df = pd.DataFrame(df)
while the real data I'm using has hundreds of columns, I want to manipulate these columns using different functions like min
,max
as well as self-defined function like:
def dist(x):
return max(x) - min(x)
def HHI(x):
ss = sum([s**2 for s in x])
return ss
Instead of wirting many lines, I want to have a function like :
def myfunc(cols,fun):
return df.groupby('col_3')[[cols]].transform(lambda x: fun)
# which allow me to do something like:
df[['min_' + s for s in cols]] = myfunc(cols, min)
df[['max_' + s for s in cols]] = myfunc(cols, max)
df[['dist_' + s for s in cols]] = myfunc(cols, dist)
Is this possible in Python(my guess is 'yes')?
Then how if yes?
EDIT ====== ABOUT NAME OF SELF-DEFINED FUNCTION =======
According to jpp
's solution, what I've asked is possible, at least for bulit-in functions, more work need regard self-defined function.
A workable solution,
temp = df.copy()
for func in ['HHI','DIST'] :
print(func)
temp[[ func + s for s in cols]] = df.pipe(myfunc,cols,eval(func))
The key here is to use eval
tunction to convert string expression as a function. However, there may be better way to do this, looking forward to see.
EDIT ====== per jpp's comment about name of self-defined function =======
jpp's comment that feeds function name directly to myfun
is valid based on my test, however, new column name based on func
will be some thing like: <function HHI at 0x00000194460019D8>
, which is not very readable, the modification is temp[[ str(func.__name__) + s for s in cols]]
, hope this will help those who come to this problem later.
We cannot pass the function as an argument to another function. But we can pass the reference of a function as a parameter by using a function pointer. This process is known as call by reference as the function parameter is passed as a pointer that holds the address of arguments.
Although parameters are also commonly referred to as arguments, arguments are sometimes thought of as the actual values or references assigned to the parameter variables when the subroutine is called at run-time.
A function is defined as a relation between a set of inputs having one output each. In simple words, a function is a relationship between inputs where each input is related to exactly one output. Every function has a domain and codomain or range.
In Python you can pass function objects in to other functions. Functions can be passed around in Python. In fact there are functions built into Python that expect functions to be given as one or more of their arguments so that they can then call them later.
Here's one way using pd.DataFrame.pipe
.
With Python everything is an object and can be passed around with no type-checking. The philosophy is "Don't check if it works, just try it...". Hence you can pass either a string or a function to myfunc
and thereon to transform
without any harmful side-effects.
def myfunc(df, cols, fun):
return df.groupby('col_3')[cols].transform(fun)
cols = ['col_1', 'col_2']
df[[f'min_{s}' for s in cols]] = df.pipe(myfunc, cols, 'min')
df[[f'max_{s}' for s in cols]] = df.pipe(myfunc, cols, 'max')
df[[f'dist_{s}' s in cols]] = df.pipe(myfunc, cols, lambda x: x.max() - x.min())
Result:
print(df)
col_1 col_2 col_3 min_col_1 min_col_2 max_col_1 max_col_2 dist_col_1 \
0 1 1 A 1 1 5 5 4
1 2 2 A 1 1 5 5 4
2 3 3 A 1 1 5 5 4
3 4 4 A 1 1 5 5 4
4 5 5 A 1 1 5 5 4
5 6 6 B 6 6 10 10 4
6 7 7 B 6 6 10 10 4
7 8 8 B 6 6 10 10 4
8 9 9 B 6 6 10 10 4
9 10 10 B 6 6 10 10 4
dist_col_2
0 4
1 4
2 4
3 4
4 4
5 4
6 4
7 4
8 4
9 4
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With