I would like to go through a list, and check to see if that item is the most frequent item in the list up until that point. The solution I currently have is incredibly slow compared to Python. Is there an effective way to speed it up?
dat<-data.table(sample(1:50,10000,replace=T))
k<-1
correct <- 0 # total correct predictions
for (i in 2:(nrow(dat)-1)) {
if (dat[i,V1] %in% dat[1:(i-1),.N,by=V1][order(-N),head(.SD,k)][,V1]) {
correct <- correct + 1
}
}
More generally, I would eventually like to see if an item is one of the k most frequent items up until a point, or if it has one of the k highest values up until a point.
For comparison, here is a very fast implementation in Python:
dat=[random.randint(1,50) for i in range(10000)]
correct=0
k=1
list={}
for i in dat:
toplist=heapq.nlargest(k,list.iteritems(),key=operator.itemgetter(1))
toplist=[j[0] for j in toplist]
if i in toplist:
correct+=1
if list.has_key(i):
list[i]=list[i]+1
else:
list[i]=1
Here's what I've got so far (my solution is f3):
set.seed(10)
dat<-data.table(sample(1:3,100,replace=T))
k<-1
f3 <- function(dat) {
correct <- 0 # total correct predictions
vf <- factor(dat$V1)
v <- as.integer(vf)
tabs <- integer(max(v))
for (i in 2:(nrow(dat)-1)) {
tabs[v[i-1]] <- tabs[v[i-1]] + 1
#print(tabs)
#print(v[1:i])
if (match(v[i],order(tabs,decreasing = T))<=k) {
correct <- correct + 1
}
#print(correct)
#print('')
}
correct
}
f1 <- function(dat) {
correct <- 0 # total correct predictions
for (i in 2:(nrow(dat)-1)) {
if (dat[i,V1] %in% dat[1:(i-1),.N,by=V1][order(-N),head(.SD,k)]) {
correct <- correct + 1
}
}
correct
}
library(rbenchmark)
print(f1(dat)==f3(dat))
library(rbenchmark)
benchmark(f1(dat),f3(dat),replications=10)
The benchmark results:
test replications elapsed relative user.self sys.self user.child sys.child
1 f1(dat) 10 2.939 163.278 2.931 0.008 0 0
2 f3(dat) 10 0.018 1.000 0.018 0.000 0 0
are encouraging, but f3
has two problems:
It doesn't always provide the same answer as OP's algorithm because the ties are treated differently,
There is a lot of room for improvement, because tabs
are sorted every time anew.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With