Not sure how to formulate the question in words, but how can I create an index-column for a data.table that per group increments when a different value appear? Here is the MWE <pre class="prettyprint"><code>library(data.table) in.data <- data.table(fruits=c(rep("banana", 4), rep("pear", 5)),vendor=c("a", "b", "b", "c", "d", "d", "e", "f", "f")) </code></pre> Here is the result the R-code should generate <pre class="prettyprint"><code>in.data[, wanted.column:=c(1,2,2,3,1,1,2,3,3)] # fruits vendor wanted.column # 1: banana a 1 # 2: banana b 2 # 3: banana b 2 # 4: banana c 3 # 5: pear d 1 # 6: pear d 1 # 7: pear e 2 # 8: pear f 3 # 9: pear f 3 </code></pre> So it labels each vendor 1, 2, 3, ... within each fruit. There is probably a very simple solution, but I'm stuck.

I have a few ideas. You can use a nested group counter: <pre class="prettyprint"><code>in.data[, w := setDT(list(v = vendor))[, g := .GRP, by=v]$g, by=fruits] </code></pre> Alternately, make a run ID, which depends on sorted data (thanks @eddi) and seems wasteful: <pre class="prettyprint"><code>in.data[, w := rleid(vendor), by=fruits] </code></pre> The base-R approach would probably be: <pre class="prettyprint"><code>in.data[, w := match(vendor, unique(vendor)), by=fruits] # or in base R ... in.data$w = with(in.data, ave(vendor, fruits, FUN = function(x) match(x, unique(x)))) </code></pre>

Index unique values in data.table

Tags:

r

data.table

Not sure how to formulate the question in words, but how can I create an index-column for a data.table that per group increments when a different value appear?

Here is the MWE

library(data.table)
in.data <- data.table(fruits=c(rep("banana", 4), rep("pear", 5)),vendor=c("a", "b", "b", "c", "d", "d", "e", "f", "f"))

Here is the result the R-code should generate

in.data[, wanted.column:=c(1,2,2,3,1,1,2,3,3)]

#    fruits vendor wanted.column
# 1: banana      a             1
# 2: banana      b             2
# 3: banana      b             2
# 4: banana      c             3
# 5:   pear      d             1
# 6:   pear      d             1
# 7:   pear      e             2
# 8:   pear      f             3
# 9:   pear      f             3

So it labels each vendor 1, 2, 3, ... within each fruit. There is probably a very simple solution, but I'm stuck.

581

asked Feb 12 '16 20:02

Chris

3 Answers

I have a few ideas. You can use a nested group counter:

in.data[, w := setDT(list(v = vendor))[, g := .GRP, by=v]$g, by=fruits]

Alternately, make a run ID, which depends on sorted data (thanks @eddi) and seems wasteful:

in.data[, w := rleid(vendor), by=fruits]

The base-R approach would probably be:

in.data[, w := match(vendor, unique(vendor)), by=fruits]

# or in base R ...

in.data$w = with(in.data, ave(vendor, fruits, FUN = function(x) match(x, unique(x))))

answered Oct 13 '22 19:10

Frank

Another approach might be two steps :

DT = data.table(fruits=c(rep("banana", 4), rep("pear", 5)),vendor=c("a", "b", "b", "c", "d", "d", "e", "f", "f"))
DT
   fruits vendor
1: banana      a
2: banana      b
3: banana      b
4: banana      c
5:   pear      d
6:   pear      d
7:   pear      e
8:   pear      f
9:   pear      f
DT[, wanted:=.GRP, by="fruits,vendor"]  # step 1
DT
   fruits vendor wanted
1: banana      a      1
2: banana      b      2
3: banana      b      2
4: banana      c      3
5:   pear      d      4
6:   pear      d      4
7:   pear      e      5
8:   pear      f      6
9:   pear      f      6
DT[, wanted:=wanted-wanted[1]+1L, by="fruits"]  # step 2 (adjust)
DT
   fruits vendor wanted
1: banana      a      1
2: banana      b      2
3: banana      b      2
4: banana      c      3
5:   pear      d      1
6:   pear      d      1
7:   pear      e      2
8:   pear      f      3
9:   pear      f      3
>

The way I would comment this in production code might be :

DT[, wanted:=.GRP, by="fruits,vendor"]          # .GRP is simple group counter
DT[, wanted:=wanted-wanted[1]+1L, by="fruits"]  # reset vendor counter per fruit

answered Oct 13 '22 18:10

Matt Dowle

If you want the index to be the same for all vendors within a given fruit, then this is another option:

in.data[, wanted := as.integer(factor(vendor, levels = unique(vendor))), by = fruits]

Otherwise, if you want it to tick up every time the vendor changes, then, from the given answers so far, rleid is the only one that works.

answered Oct 13 '22 20:10

eddi

Related questions
                            
                                R Markdown: How do I make text float around figures?
                            
                                Why use st_intersection rather than st_intersects?
                            
                                Using lapply with changing arguments
                            
                                In R, how do I get all possible combinations of the values of some vectors?
                            
                                Weighted random number generation in R
                            
                                How to calculate the area of polygon overlap in R?
                            
                                Plotting huge data files in R?
                            
                                Using R: How do I create a time-series object with dates?
                            
                                Is there a shorter way to extract a date from a string?
                            
                                Why does the number of rows change during AIC in R? How to ensure that this doesn't happen?
                            
                                Why does optimx in R not give the correct solution to this simple nonparametric likelihood maximization?
                            
                                how to plot the linear regression in R?
                            
                                Replace missing value with previous value [duplicate]
                            
                                reading a .tif file in R [closed]
                            
                                How to get Euler–Mascheroni's constant in R?
                            
                                Keyboard shortcut to empty workspace/environment in RStudio
                            
                                R draw heatmap with clusters, but hide dendrogram
                            
                                how to use %dopar% when only import foreach in DESCRIPTION of a package
                            
                                Plotting a raster with the color ramp diverging around zero
                            
                                Extract names of dataframes passed with dots

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With