Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

reduce (fold) in Pandas

Tags:

python

pandas

Is there any reduce/fold implementations for Pandas DataFrame? For example, I want to get sum of numbers in column named cost in dataframe df, using something like lambda acc, x, where x is a DataFrame row.

What should I do?

P.S. I know about .sum(), but there are many other possible \acc,x -> ... functions.

like image 708
fevgenym Avatar asked Mar 28 '17 16:03

fevgenym


People also ask

What does melt () do in Python?

melt() function is useful to message a DataFrame into a format where one or more columns are identifier variables, while all other columns, considered measured variables, are unpivoted to the row axis, leaving just two non-identifier columns, variable and value.

What does melt do in pandas?

Pandas melt() function is used to change the DataFrame format from wide to long. It's used to create a specific format of the DataFrame object where one or more columns work as identifiers. All the remaining columns are treated as values and unpivoted to the row axis and only two columns - variable and value.

What is the flatten method in pandas?

Return a copy of the array collapsed into one dimension. Whether to flatten in C (row-major), Fortran (column-major) order, or preserve the C/Fortran ordering from a . The default is 'C'.


1 Answers

A fast fold is available in the following way (replace plus with your own function):

import numpy as np
def accum(op,ser):
    u_op = np.frompyfunc(op, 2, 1) # two inputs, one output
    return u_op.accumulate(ser, dtype=np.object)
def plus(x,y):
    return x+y
accum(plus,np.arange(10))

You get:

array([0, 1, 3, 6, 10, 15, 21, 28, 36, 45], dtype=object)

This works for numpy series, and hence also for components of pandas dataframes. It would be interesting to have a solution that directly works on dataframes, such that multiple series can be combined.

like image 110
tillmo Avatar answered Oct 29 '22 16:10

tillmo