Is there any reduce/fold implementations for Pandas DataFrame? For example, I want to get sum of numbers in column named <code>cost</code> in dataframe <code>df</code>, using something like <code>lambda acc, x</code>, where <code>x</code> is a DataFrame row. What should I do? P.S. I know about .sum(), but there are many other possible <code>\acc,x -> ...</code> functions.

A fast fold is available in the following way (replace <code>plus</code> with your own function): <pre class="prettyprint"><code>import numpy as np def accum(op,ser): u_op = np.frompyfunc(op, 2, 1) # two inputs, one output return u_op.accumulate(ser, dtype=np.object) def plus(x,y): return x+y accum(plus,np.arange(10)) </code></pre> You get: <pre class="prettyprint"><code>array([0, 1, 3, 6, 10, 15, 21, 28, 36, 45], dtype=object) </code></pre> This works for numpy series, and hence also for components of pandas dataframes. It would be interesting to have a solution that directly works on dataframes, such that multiple series can be combined.

reduce (fold) in Pandas

Is there any reduce/fold implementations for Pandas DataFrame? For example, I want to get sum of numbers in column named cost in dataframe df, using something like lambda acc, x, where x is a DataFrame row.

What should I do?

P.S. I know about .sum(), but there are many other possible \acc,x -> ... functions.

What does melt () do in Python?

melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

What does melt do in pandas?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

What is the flatten method in pandas?

Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.

A fast fold is available in the following way (replace plus with your own function):

import numpy as np
def accum(op,ser):
    u_op = np.frompyfunc(op, 2, 1) # two inputs, one output
    return u_op.accumulate(ser, dtype=np.object)
def plus(x,y):
    return x+y
accum(plus,np.arange(10))

You get:

array([0, 1, 3, 6, 10, 15, 21, 28, 36, 45], dtype=object)

This works for numpy series, and hence also for components of pandas dataframes. It would be interesting to have a solution that directly works on dataframes, such that multiple series can be combined.

reduce (fold) in Pandas

Tags:

python

pandas

fevgenym

People also ask

1 Answers

tillmo

Recent Activity

Donate For Us

reduce (fold) in Pandas

Tags:

python

pandas

fevgenym

People also ask

1 Answers

tillmo

Related questions

Recent Activity

Donate For Us