Let's say I have this dataframe (the "number" variable is also from character-type in the original dataframe): <pre class="prettyprint"><code>df <- data.frame( id = c(1,2,2,1,2), number = c(30.6, "50.2/15.5", "45/58.4", 80, "57/6")) df$number <- as.character(df$number) </code></pre> Now I want to add another column with the cumulative sum for each ID and I did this with <code>df %>% mutate(csum = ave(number, id, FUN=cumsum))</code>, which works for the single numbers, but of course not for the numbers separated with "/". How can I solve this problem? The final dataframe should be like this: <pre class="prettyprint"><code>df2 <- data.frame( id = c(1,2,2,1,2), number = c(30.6, "50.2/15.5", "45/58.4", 80, "57/6"), csum = c(30.6, "50.2/15.5", "95.2/73.9", 110.6, "152.2/79.9")) df2 </code></pre>

One way could be: <ol> <li>group with <code>group_by</code> </li> <li> <code>separate</code> in column <code>a</code> and <code>b</code> </li> <li> <code>mutate</code> across <code>a</code> and <code>b</code> and apply <code>cumsum</code> </li> <li> <code>unite</code> from <code>tidyr</code> package using <code>na.rm=TRUE</code> argument</li> </ol> <pre class="prettyprint"><code>library(dplyr) library(tidyr) df %>% group_by(id) %>% separate(number, c("a", "b"), sep="/", remove = FALSE, convert = TRUE) %>% mutate(across(c(a,b), ~cumsum(.))) %>% unite(csum, c(a,b), sep = '/', na.rm = TRUE) </code></pre> <pre class="prettyprint"><code> id number csum <dbl> <chr> <chr> 1 1 30.6 30.6 2 2 50.2/15.5 50.2/15.5 3 2 45/58.4 95.2/73.9 4 1 80 110.6 5 2 57/6 152.2/79.9 </code></pre>

We could use <code>base R</code> - read the 'number' column with <code>read.table</code> to split it to two columns, create a logical vector where there are no <code>NAs</code>, subset the 'd1' rows, loop over the columns, get the cumulative sum (<code>cumsum</code>) and <code>paste</code>, then assign it to a new column 'csum' in the original dataset <pre class="prettyprint"><code>d1 <- read.table(text = df$number, sep = "/", fill = TRUE, header = FALSE) i1 <- !rowSums(is.na(d1)) > 0 df$csum[i1] <- do.call(paste, c(lapply(d1[i1,], cumsum), sep = "/")) </code></pre> -output <pre class="prettyprint"><code>> df id number csum 1 1 30.6 <NA> 2 2 50.2/15.5 50.2/15.5 3 2 45/58.4 95.2/73.9 4 1 80 <NA> 5 2 57/6 152.2/79.9 </code></pre>

Cumulative sum for more values in one entry

Tags:

dataframe

r

sum

cumsum

Let's say I have this dataframe (the "number" variable is also from character-type in the original dataframe):

Click to copy

df <- data.frame(
  id = c(1,2,2,1,2),
  number = c(30.6, "50.2/15.5", "45/58.4", 80, "57/6"))
df$number <- as.character(df$number)

Now I want to add another column with the cumulative sum for each ID and I did this with df %>% mutate(csum = ave(number, id, FUN=cumsum)), which works for the single numbers, but of course not for the numbers separated with "/". How can I solve this problem?

The final dataframe should be like this:

Click to copy

df2 <- data.frame(
  id = c(1,2,2,1,2),
  number = c(30.6, "50.2/15.5", "45/58.4", 80, "57/6"),
  csum = c(30.6, "50.2/15.5", "95.2/73.9", 110.6, "152.2/79.9"))
df2

951

asked Dec 24 '21 09:12

Katharina

Video Answer

3 Answers

One way could be:

group with group_by
separate in column a and b
mutate across a and b and apply cumsum
unite from tidyr package using na.rm=TRUE argument

Click to copy

library(dplyr)
library(tidyr)

df %>% 
  group_by(id) %>% 
  separate(number, c("a", "b"), sep="/", remove = FALSE, convert = TRUE) %>% 
  mutate(across(c(a,b), ~cumsum(.))) %>% 
  unite(csum, c(a,b), sep = '/', na.rm = TRUE)

Click to copy

     id number    csum      
  <dbl> <chr>     <chr>     
1     1 30.6      30.6      
2     2 50.2/15.5 50.2/15.5 
3     2 45/58.4   95.2/73.9 
4     1 80        110.6     
5     2 57/6      152.2/79.9

113

answered Oct 18 '22 19:10

TarJae

You could use the extremely fast matrixStats::colCumsums.

Click to copy

res <- do.call(rbind, by(df, df$id, \(x) {
  cs <- matrixStats::colCumsums(do.call(rbind, strsplit(x$number, '/')) |> 
                                  type.convert(as.is=TRUE))
  r <- do.call(paste, c(as.list(as.data.frame(cs)), sep='/'))
  data.frame(id=x$id, number=x$number, csum=r)
}))

Note: R version 4.1.2 (2021-11-01).

Gives:

Click to copy

res
#     id    number       csum
# 1.1  1      30.6       30.6
# 1.2  1        80      110.6
# 2.1  2 50.2/15.5  50.2/15.5
# 2.2  2   45/58.4  95.2/73.9
# 2.3  2      57/6 152.2/79.9

answered Oct 18 '22 20:10

jay.sf

We could use base R - read the 'number' column with read.table to split it to two columns, create a logical vector where there are no NAs, subset the 'd1' rows, loop over the columns, get the cumulative sum (cumsum) and paste, then assign it to a new column 'csum' in the original dataset

Click to copy

d1 <- read.table(text = df$number, sep = "/", fill = TRUE, header = FALSE)
i1 <- !rowSums(is.na(d1)) > 0
df$csum[i1] <-  do.call(paste, c(lapply(d1[i1,], cumsum), sep = "/"))

-output

Click to copy

> df
  id    number       csum
1  1      30.6       <NA>
2  2 50.2/15.5  50.2/15.5
3  2   45/58.4  95.2/73.9
4  1        80       <NA>
5  2      57/6 152.2/79.9

answered Oct 18 '22 20:10

akrun

Related questions
                            
                                Forcats reordering not working for ggplot
                            
                                R's switch statement is not a special form, is it therefore slow?
                            
                                Create a time to and time after event variables
                            
                                R ERROR: dependencies ‘xml2’, ‘httr’ are not available for package (Linux Mint 20.1)
                            
                                R data.table: Difference between nested regressions results
                            
                                How can I create a new dataframe in R that combines the first date and last date available for each ID?
                            
                                Count occurrence of IDs within the last x days in R
                            
                                How to find out all integers between two real numbers using R
                            
                                knitr: Using subscript with fig.cap in Markdown
                            
                                Testing a conditional over every element of a matrix
                            
                                `data` must be a data frame, or other object coercible by `fortify()`, not an S3 object with class ranger
                            
                                Use `[` method from data.table package in package development
                            
                                In R ,how can i replac the NA by the previous character [duplicate]
                            
                                How to preserve decimal values when converting POSIXct to character?
                            
                                Partially read really large csv.gz in R using vroom
                            
                                How to print on a serie sof graphs pairwise comparisons bars and effect size value?
                            
                                Squid game Episode 7 with simulation
                            
                                How to use stringr functions to remove all empty words?
                            
                                Mutate across multiple columns to create new variable sets
                            
                                replace_na with tidyselect?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cumulative sum for more values in one entry

Tags:

dataframe

r

sum

cumsum

Katharina

People also ask

Video Answer

3 Answers

TarJae

jay.sf

akrun

Recent Activity

Donate For Us