Is there any reduce/fold implementations for Pandas DataFrame?
For example, I want to get sum of numbers in column named cost
in dataframe df
, using something like lambda acc, x
, where x
is a DataFrame row.
What should I do?
P.S. I know about .sum(), but there are many other possible \acc,x -> ...
functions.
melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.
Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.
Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.
A fast fold is available in the following way (replace plus
with your own function):
import numpy as np
def accum(op,ser):
u_op = np.frompyfunc(op, 2, 1) # two inputs, one output
return u_op.accumulate(ser, dtype=np.object)
def plus(x,y):
return x+y
accum(plus,np.arange(10))
You get:
array([0, 1, 3, 6, 10, 15, 21, 28, 36, 45], dtype=object)
This works for numpy series, and hence also for components of pandas dataframes. It would be interesting to have a solution that directly works on dataframes, such that multiple series can be combined.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With