A simplified version of my data set would look like:
depth value
1 a
1 b
2 a
2 b
2 b
3 c
I would like to make a new data set where, for each value of "depth", I would have the cumulative number of unique values, starting from the top. e.g.
depth cumsum
1 2
2 2
3 3
Any ideas as to how to do this? I am relatively new to R.
To find unique values in a column in a data frame, use the unique() function in R. In Exploratory Data Analysis, the unique() function is crucial since it detects and eliminates duplicate values in the data.
To find the number of unique values in each row of an R data frame, we can use apply function with length and unique function.
The Unique Count measure gives the number of unique (distinct) values in a column. Empty values are not counted. In the table below, column A has a unique count of two and column B has a unique count of three.
I find this a perfect case of using factor
and setting levels
carefully. I'll use data.table
here with this idea. Make sure your value
column is character
(not an absolute requirement).
step 1: Get your data.frame
converted to data.table
by taking just unique
rows.
require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth") # just to be sure before factoring "value"
step 2: Convert value
to a factor
and coerce to numeric
. Make sure to set the levels yourself (it is important).
dt[, id := as.numeric(factor(value, levels = unique(value)))]
step 3: Set key column to depth
for subsetting and just pick the last value
setkey(dt, "depth", "id")
dt.out <- dt[J(unique(depth)), mult="last"][, value := NULL]
# depth id
# 1: 1 2
# 2: 2 2
# 3: 3 3
step 4: Since all values in the rows with increasing depth should have at least the value of the previous row, you should use cummax
to get the final output.
dt.out[, id := cummax(id)]
Edit: The above code was for illustrative purposes. In reality you don't need a 3rd column at all. This is how I'd write the final code.
require(data.table)
dt <- as.data.table(unique(df))
setkey(dt, "depth")
dt[, value := as.numeric(factor(value, levels = unique(value)))]
setkey(dt, "depth", "value")
dt.out <- dt[J(unique(depth)), mult="last"]
dt.out[, value := cummax(value)]
Here's a more tricky example and the output from the code:
df <- structure(list(depth = c(1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 6),
value = structure(c(1L, 2L, 3L, 4L, 1L, 3L, 4L, 5L, 6L, 1L, 1L),
.Label = c("a", "b", "c", "d", "f", "g"), class = "factor")),
.Names = c("depth", "value"), row.names = c(NA, -11L),
class = "data.frame")
# depth value
# 1: 1 2
# 2: 2 4
# 3: 3 4
# 4: 4 5
# 5: 5 6
# 6: 6 6
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With