Pandas group by and sum, but create a new row when a certain amount is exceeded

Tags:

I currently have a data set where im trying to group up rows based on a column and sum the columns where the values are integers.

However, the catch is I would like to create a new row once the sum has reached a certain threshhold

For example, in the below dataframe, I am trying to group the rows based on company name and sum up the weights, however, I do not want weight to exceed 100.

Input dataframe:

Company	Weight
a	30
b	45
a	27
a	40
b	57
a	57
b	32

Output dataframe:

Company	Weight
a	97
a	57
b	89
b	45

I have tried using group by and sum, however, it cannot detect whether or not I have reached a maximum amount.

Is there any way I can achieve this?

Any help would be greatly appreciated!

800

asked Jun 03 '21 05:06

2 Answers

I think here are necessary loops, so for improve performance is use numba, modified solution from Divakar, called function per groups by GroupBy.transform and then aggregate sum:

from numba import njit

@njit
def make_groups(x, target):
    result = np.empty(len(x),dtype=np.uint64)
    total = 0
    group = 0
    for i,x_i in enumerate(x):
        total += x_i
        if total >= target:
            group += 1
            total = 0
        result[i] = group
    return result

g = df.groupby("Company")["Weight"].transform(lambda x: make_groups(x.to_numpy(), 100))

df1 = (df.groupby(by=["Company", g])
        .sum()
        .reset_index(1, drop=True)
        .sort_values(['Company','Weight'], ascending=[True, False])
        .reset_index())
print (df1)
  Company  Weight
0       a      97
1       a      57
2       b      89
3       b      45

186

answered Oct 23 '22 01:10

jezrael

well, it depends, you're asking an NP problem currently unless you don't want the optimum weight in under 100, there are a few algoritems you can do,

but none are o(n) which is what group by and the sum does, lets say you iterate with iterrows() (try to avoid that), would you be able to do so in one iteration? if you are not looking for an optimum solution (closest to 100 each match) there is an option.

for every company, you have to sort it by increasing values. using iteration to open a new row every time sum is reaching a 100, at a side variable, and replacing the origin at the end

There isn't a pandas / Numpy standard solution that I know of.

answered Oct 23 '22 00:10

masasa

Related questions
                            
                                how to prevent Poetry to consider .gitignore
                            
                                StartQueryExecution operation: Unable to verify/create output bucket
                            
                                FastAPI How to fix error walking file system: OSError [Errno 40] Too many levels of symbolic links: '/sys/class/vtconsole/vtcon0/subsystem?
                            
                                RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces)
                            
                                How to index an array with its indices in numpy?
                            
                                How to split the string by '/' and reform it by the split substrings in a dataframe?
                            
                                In Pytorch, is there a difference between (x<0) and x.lt(0)?
                            
                                You have missing dependencies! # Mandatory: spyder_kernels >=2.0.1,<2.1.0 : 2.0.1 (NOK) [duplicate]
                            
                                Apply heatmap on video with OpenCV and Python
                            
                                Why does this print statement using a Python f-string output double parentheses?
                            
                                Replacing values in a data.frame according to a value from an other data.frame with the same shape (Python)
                            
                                Pandas explode dictionary to rows
                            
                                Cannot use keras models on Mac M1 with BigSur
                            
                                Permission denied with pip install --user -e /home/me/package/
                            
                                Convert result from groupby on multiple columns to list of dictionaries
                            
                                How do I tell sympy that i^2 = -1?
                            
                                Multiple Hidden Imports in Pyinstaller
                            
                                Same output in different workers in multiprocessing
                            
                                Explain to me what the big deal with tail call optimization is and why Python needs it
                            
                                How to determine which points are inside of a polygon and which are not (large number of points)?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Pandas group by and sum, but create a new row when a certain amount is exceeded

Tags:

python

pandas

pandas-groupby

ChrisHo1341

People also ask

2 Answers

jezrael

masasa

Recent Activity

Donate For Us