efficient use of R data.table and unique()

Tags:

data.table

Is there a more efficient query than the following

DT[, list(length(unique(OrderNo)) ),customerID]

to refine a LONG format table with customer id's, order number and product line items, meaning that there will be duplicate rows with the same order id if a customer has purchased more than 1 item in that transaction.

Trying to work out unique purchases. length() gives a count of all order id's by customer ID including duplicates, looking for just the unique number.

Edit from here:

Here is some dummy code. Ideally what i am looking for is the output from the first query using the unique().

df <- data.frame(
             customerID=as.factor(c(rep("A",3),rep("B",4))),
             product=as.factor(c(rep("widget",2),rep("otherstuff",5))),
             orderID=as.factor(c("xyz","xyz","abd","qwe","rty","yui","poi")),
             OrderDate=as.Date(c("2013-07-01","2013-07-01","2013-07-03","2013-06-01","2013-06-02","2013-06-03","2013-07-01"))
             )

DT.eg <- as.data.table(df)
#Gives unique order counts
DT.eg[, list(orderlength = length(unique(orderID)) ),customerID]
#Gives counts of all orders by customer
DT.eg[,.SD, keyby=list(orderID, customerID)][, .N, by=customerID]

         ^
         |
  This should be .N, not .SD  ~ R.S.

785

asked Oct 24 '13 01:10

digdeep

2 Answers

if you are trying to count the number of unique purchases per customer, use

 DT[, .N, keyby=list(customerId, OrderNo)][, .N, by=customerId]

answered Oct 05 '22 12:10

Ricardo Saporta

As of version 1.9.6 (on CRAN 19 Sep 2015), data.table has gained the helper function uniqueN() which is equivalent to length(unique(x)) but much faster (according to data.table NEWS).

With this,

DT.eg[, list(orderlength = length(unique(orderID)) ),customerID]

and

DT.eg[,.N, keyby=list(orderID, customerID)][, .N, by=customerID]

can be rewritten as

DT.eg[, .(orderlength = uniqueN(orderID)), customerID]

   customerID orderlength
1:          A           2
2:          B           4

answered Oct 05 '22 14:10

Uwe

Related questions
                            
                                communicating with SAS datasets from R
                            
                                Converting models in Matlab/R to C++/Java
                            
                                Plot two graphs on one plot. function lines does not work
                            
                                Is there a way to automatically get general info of many stocks like P/E ratio, Yield, and so on?
                            
                                Does the xlsx package work for xlsm files in R?
                            
                                Finding all functions in current workspace
                            
                                Print data frame with columns center-aligned
                            
                                How to add the total sums to the table and get proportion for each cell in R
                            
                                How to load only specific functions from a package
                            
                                Apply t-test on many columns in a dataframe split by factor
                            
                                Identify dates in the same week
                            
                                Fast melted data.table operations
                            
                                trouble adding geom_vline to ggplot2
                            
                                Set page width in Knitr for md or HTML output
                            
                                merge 3 data.frames by column names
                            
                                Mahalanobis distance in R
                            
                                Why does 1..99,999 == "1".."99,999" in R, but 100,000 != "100,000"?
                            
                                SpatialLinesDataFrame: how to calculate the min. distance between a point and a line
                            
                                2 Column Report in R Markdown - Render HTML aside Data Frame
                            
                                How to calculate any negative number to the power of some fraction in R?

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With