Cumulative count of unique values in R

Tags:

A simplified version of my data set would look like:

depth value
   1     a
   1     b
   2     a
   2     b
   2     b
   3     c

I would like to make a new data set where, for each value of "depth", I would have the cumulative number of unique values, starting from the top. e.g.

depth cumsum
 1      2
 2      2
 3      3

Any ideas as to how to do this? I am relatively new to R.

897

asked Mar 29 '13 06:03

user2223405

1 Answers

I find this a perfect case of using factor and setting levels carefully. I'll use data.table here with this idea. Make sure your value column is character (not an absolute requirement).

step 1: Get your data.frame converted to data.table by taking just unique rows.

require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth") # just to be sure before factoring "value"

step 2: Convert value to a factor and coerce to numeric. Make sure to set the levels yourself (it is important).
```
dt[, id := as.numeric(factor(value, levels = unique(value)))]
```

step 3: Set key column to depth for subsetting and just pick the last value

 setkey(dt, "depth", "id")
 dt.out <- dt[J(unique(depth)), mult="last"][, value := NULL]

#    depth id
# 1:     1  2
# 2:     2  2
# 3:     3  3

step 4: Since all values in the rows with increasing depth should have at least the value of the previous row, you should use cummax to get the final output.
```
dt.out[, id := cummax(id)]
```

Edit: The above code was for illustrative purposes. In reality you don't need a 3rd column at all. This is how I'd write the final code.

require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth")
dt[, value := as.numeric(factor(value, levels = unique(value)))]
setkey(dt, "depth", "value")
dt.out <- dt[J(unique(depth)), mult="last"]
dt.out[, value := cummax(value)]

Here's a more tricky example and the output from the code:

df <- structure(list(depth = c(1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6), 
                value = structure(c(1L, 2L, 3L, 4L, 1L, 3L, 4L, 5L, 6L, 1L, 1L), 
                .Label = c("a", "b", "c", "d", "f", "g"), class = "factor")), 
                .Names = c("depth", "value"), row.names = c(NA, -11L), 
                class = "data.frame")
#    depth value
# 1:     1     2
# 2:     2     4
# 3:     3     4
# 4:     4     5
# 5:     5     6
# 6:     6     6

114

answered Oct 28 '22 22:10

Arun

Related questions
                            
                                How to edit and debug R library sources
                            
                                Importing wikipedia tables in R
                            
                                Setting hex bins in ggplot2 to same size
                            
                                How to track a progress while building model with the caret package?
                            
                                R - ggplot2 - setting tick mark interval [duplicate]
                            
                                How NOT to select columns using select() dplyr when you have character vector of colnames?
                            
                                version control for one-man project using eclipse? [closed]
                            
                                How to draw a line or add a text outside of the plot area in R?
                            
                                Preserve proportion of graphs using grid.arrange
                            
                                Plot two Graphs on Same Chart R, ggplot2 par(mfrow())
                            
                                How to parallelelize do() calls with dplyr
                            
                                data.table | faster row-wise recursive update within group
                            
                                How to copy an object's structure (but not the data)
                            
                                How do I retrieve a matrix column and row name by a matrix index value?
                            
                                ggplot 2 facet_grid "free_y" but forcing Y axis to be rounded to nearest whole number
                            
                                Functional way to stack list of 2d matrices into 3d matrix
                            
                                shiny fluidrow column white space
                            
                                Launching R help: Error in file(out, "wt") : cannot open the connection
                            
                                Rename list items
                            
                                How to plot a hybrid boxplot: half boxplot with jitter points on the other half?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Cumulative count of unique values in R

Tags:

r

unique

cumulative-sum

user2223405

People also ask

1 Answers

Arun

Recent Activity

Donate For Us