Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Cumulative sum for more values in one entry

Let's say I have this dataframe (the "number" variable is also from character-type in the original dataframe):

df <- data.frame(
  id = c(1,2,2,1,2),
  number = c(30.6, "50.2/15.5", "45/58.4", 80, "57/6"))
df$number <- as.character(df$number)

Now I want to add another column with the cumulative sum for each ID and I did this with df %>% mutate(csum = ave(number, id, FUN=cumsum)), which works for the single numbers, but of course not for the numbers separated with "/". How can I solve this problem?

The final dataframe should be like this:

df2 <- data.frame(
  id = c(1,2,2,1,2),
  number = c(30.6, "50.2/15.5", "45/58.4", 80, "57/6"),
  csum = c(30.6, "50.2/15.5", "95.2/73.9", 110.6, "152.2/79.9"))
df2
like image 951
Katharina Avatar asked Dec 24 '21 09:12

Katharina


People also ask

How do you do a cumulative sum list in Python?

We declare an empty list cum_list to which we will append elements to form the cumulative sum list. Initialize a sum variable sm=0. Start iterating over the input list, with each iteration we increment the sum value to previous value+ the current element. On each iteration, the sum value is appended to the cum_list.

How do you sum cumulative in R?

Calculate Cumulative Sum of a Numeric Object in R Programming – cumsum() Function. The cumulative sum can be defined as the sum of a set of numbers as the sum value grows with the sequence of numbers. cumsum() function in R Language is used to calculate the cumulative sum of the vector passed as argument.

What is an example of a cumulative sum?

A cumulative, or running sum, stores the sum so far at each step as it adds the elements from the vector. For example, for vec1, it would store the first element, 1, then 3 (1 + 2), then 6 (1 + 2 + 3), then 10 (1 + 2 + 3 + 4), then, finally, 15 (1 + 2 + 3 + 4 + 5).

How to sum values based on criteria in another column?

(1) Select the column name that you will sum based on, and then click the Primary Key button; (2) Select the column name that you will sum, and then click the Calculate > Sum. (3) Click the Ok button. Now you will see the values in the specified column are summed based on the criteria in the other column.

How do I find the cumulative sum of my bank account?

In a similar manner, you can use the Excel SUM function to find the cumulative sum for your bank balance. For this, enter deposits as positive numbers, and withdrawals as negative numbers in some column (column C in this example). And then, to show the running total, enter the following formula in column D:

How to get the cumulative sum of two numbers in Excel?

Another way to obtain a cumulative sum is by using the SUM function and Absolute Reference. 1. First, enter the following formula in the cell D5: 2. It makes cell C5 an absolute reference and a relative reference at the same time.


Video Answer


3 Answers

One way could be:

  1. group with group_by
  2. separate in column a and b
  3. mutate across a and b and apply cumsum
  4. unite from tidyr package using na.rm=TRUE argument
library(dplyr)
library(tidyr)

df %>% 
  group_by(id) %>% 
  separate(number, c("a", "b"), sep="/", remove = FALSE, convert = TRUE) %>% 
  mutate(across(c(a,b), ~cumsum(.))) %>% 
  unite(csum, c(a,b), sep = '/', na.rm = TRUE)
     id number    csum      
  <dbl> <chr>     <chr>     
1     1 30.6      30.6      
2     2 50.2/15.5 50.2/15.5 
3     2 45/58.4   95.2/73.9 
4     1 80        110.6     
5     2 57/6      152.2/79.9
like image 113
TarJae Avatar answered Oct 18 '22 19:10

TarJae


You could use the extremely fast matrixStats::colCumsums.

res <- do.call(rbind, by(df, df$id, \(x) {
  cs <- matrixStats::colCumsums(do.call(rbind, strsplit(x$number, '/')) |> 
                                  type.convert(as.is=TRUE))
  r <- do.call(paste, c(as.list(as.data.frame(cs)), sep='/'))
  data.frame(id=x$id, number=x$number, csum=r)
}))

Note: R version 4.1.2 (2021-11-01).

Gives:

res
#     id    number       csum
# 1.1  1      30.6       30.6
# 1.2  1        80      110.6
# 2.1  2 50.2/15.5  50.2/15.5
# 2.2  2   45/58.4  95.2/73.9
# 2.3  2      57/6 152.2/79.9
like image 43
jay.sf Avatar answered Oct 18 '22 20:10

jay.sf


We could use base R - read the 'number' column with read.table to split it to two columns, create a logical vector where there are no NAs, subset the 'd1' rows, loop over the columns, get the cumulative sum (cumsum) and paste, then assign it to a new column 'csum' in the original dataset

d1 <- read.table(text = df$number, sep = "/", fill = TRUE, header = FALSE)
i1 <- !rowSums(is.na(d1)) > 0
df$csum[i1] <-  do.call(paste, c(lapply(d1[i1,], cumsum), sep = "/"))

-output

> df
  id    number       csum
1  1      30.6       <NA>
2  2 50.2/15.5  50.2/15.5
3  2   45/58.4  95.2/73.9
4  1        80       <NA>
5  2      57/6 152.2/79.9
like image 20
akrun Avatar answered Oct 18 '22 20:10

akrun