I have the following data table x
id1 id2
a x
a x
a y
b z
For each combination of id1, id2 I can find the number of instances in the following way
x[,list(
freq = .N
),by = "id1,id2"]
The above would yield
a x 2
a y 1
b z 1
Next I want to find the most frequent id2 for each id1, i.e. mode. So the expected result is
a x 2
b z 1
I can get there in a round about way, but is there a way to put a sequence number at the id1 level? Or some such hack that gets me to this efficiently and quickly, perhaps at the first step shown above? Thanks in advance
R does not have a standard in-built function to calculate mode. So we create a user function to calculate mode of a data set in R. This function takes the vector as input and gives the mode value as output.
In R, mean() and median() are standard functions which do what you'd expect. mode() tells you the internal storage mode of the object, not the value that occurs the most in its argument.
data.table is an R package that provides an enhanced version of data.frame s, which are the standard data structure for storing data in base R. In the Data section above, we already created a data.table using fread() . We can also create one using the data.table() function.
A data table is a range of cells in which you can change values in some of the cells and come up with different answers to a problem. A good example of a data table employs the PMT function with different loan amounts and interest rates to calculate the affordable amount on a home mortgage loan.
I'd do it this way:
setkey(dt[, list(freq = .N), by=list(id1, id2)],
id1, freq)[J(unique(id1)), mult="last"]
id1 id2 freq
1: a x 2
2: b z 1
The idea is to first get the freq
column (as you did). Then setkey
on the resulting data.table
with columns id1
and freq
. This'll sort freq
in ascending order already. With this, we can then do a by-without-by
subsetting and combine it with mult="last"
(because for every group, the last value will be the biggest, as it's sorted in ascending order).
This'll save a sort
step for each grouping which can get time-consuming with increasing number of groups. Note that this does not handle ties. That is, if you've for same id1
two equal max values, then only one will be returned.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With