Not sure how to formulate the question in words, but how can I create an index-column for a data.table that per group increments when a different value appear?
Here is the MWE
library(data.table)
in.data <- data.table(fruits=c(rep("banana", 4), rep("pear", 5)),vendor=c("a", "b", "b", "c", "d", "d", "e", "f", "f"))
Here is the result the R-code should generate
in.data[, wanted.column:=c(1,2,2,3,1,1,2,3,3)]
# fruits vendor wanted.column
# 1: banana a 1
# 2: banana b 2
# 3: banana b 2
# 4: banana c 3
# 5: pear d 1
# 6: pear d 1
# 7: pear e 2
# 8: pear f 3
# 9: pear f 3
So it labels each vendor 1, 2, 3, ... within each fruit. There is probably a very simple solution, but I'm stuck.
If two entities are connected via a many to many relationship, you create a link table, storing their foreign keys, so there is no table with 2 unique indexes. One instance can be used only once, so you must have 2 uniqie indices in a linked table (by each entity separately) to prevent use of instance already used.
Unique indexes are indexes that help maintain data integrity by ensuring that no rows of data in a table have identical key values. When you create a unique index for an existing table with data, values in the columns or expressions that comprise the index key are checked for uniqueness.
Right-click the table on which you want to create a unique index and select Design. On the Table Designer menu, select Indexes/Keys. In the Indexes/Keys dialog box, click Add. Select the new index in the Selected Primary/Unique Key or Index text box.
The unique piece is not where the difference lies. The index and key are not the same thing, and are not comparable. A key is a data column, or several columns, that are forced to be unique with a constraint, either primary key or explicitly defined unique constraint.
I have a few ideas. You can use a nested group counter:
in.data[, w := setDT(list(v = vendor))[, g := .GRP, by=v]$g, by=fruits]
Alternately, make a run ID, which depends on sorted data (thanks @eddi) and seems wasteful:
in.data[, w := rleid(vendor), by=fruits]
The base-R approach would probably be:
in.data[, w := match(vendor, unique(vendor)), by=fruits]
# or in base R ...
in.data$w = with(in.data, ave(vendor, fruits, FUN = function(x) match(x, unique(x))))
Another approach might be two steps :
DT = data.table(fruits=c(rep("banana", 4), rep("pear", 5)),vendor=c("a", "b", "b", "c", "d", "d", "e", "f", "f"))
DT
fruits vendor
1: banana a
2: banana b
3: banana b
4: banana c
5: pear d
6: pear d
7: pear e
8: pear f
9: pear f
DT[, wanted:=.GRP, by="fruits,vendor"] # step 1
DT
fruits vendor wanted
1: banana a 1
2: banana b 2
3: banana b 2
4: banana c 3
5: pear d 4
6: pear d 4
7: pear e 5
8: pear f 6
9: pear f 6
DT[, wanted:=wanted-wanted[1]+1L, by="fruits"] # step 2 (adjust)
DT
fruits vendor wanted
1: banana a 1
2: banana b 2
3: banana b 2
4: banana c 3
5: pear d 1
6: pear d 1
7: pear e 2
8: pear f 3
9: pear f 3
>
The way I would comment this in production code might be :
DT[, wanted:=.GRP, by="fruits,vendor"] # .GRP is simple group counter
DT[, wanted:=wanted-wanted[1]+1L, by="fruits"] # reset vendor counter per fruit
If you want the index to be the same for all vendors within a given fruit, then this is another option:
in.data[, wanted := as.integer(factor(vendor, levels = unique(vendor))), by = fruits]
Otherwise, if you want it to tick up every time the vendor changes, then, from the given answers so far, rleid
is the only one that works.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With