Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R - Speed optimization and Chess Rankings

Hi there!

I'm trying to calculate players chess rankings for a number of players in 6 different skills (C1, C2,...C6). I have a huge dataframe (data) of games played which looks liks this (head(data)). In this game one person (user) chooses between two other people (p1/p2) to win.

row.names   user    p1  p2  skill   win looser       time
---------------------------------------------------------
2             KE    CL  HK     C1    CL     HK  433508371
25            KE    HK  JT     c1    HK     JT  433508401
35            KE    AB  JT     C1    AB     JT  433508444
110           NF    IP  HE     C1    HE     IP  433508837
78            NF    IP  AS     C1    AS     IP  433508848
82            NF    IT  CV     C1    CV     IT  433508860

In another table (old_users) I keep track of all players chess-scores in the 6 skills (head(old_users))

    user    C1    C2    C3    C4    C5    C6                                    
1     BD  1200  1200  1200  1200  1200  1200                                    
2     NF  1200  1200  1200  1200  1200  1200                                    
3     CH  1200  1200  1200  1200  1200  1200                                    
4     AR  1200  1200  1200  1200  1200  1200                                    
5     AS  1200  1200  1200  1200  1200  1200                                    
6     MS  1200  1200  1200  1200  1200  1200                                    

The algorithm The algorithm runs through data one row at a time in a for-loop, everytime looking at the i'th row. The algorithm will look up p1's and p2's score data, retrive the two players score for the skill played. Then calculate their new score based on who wins or looses and then update the old_users cell with the corresponding new rankings.

What I Need to do I need to do this as fast as possible, and with the dataframe data being now 6000+ lines for only 24 players it takes a while to run through.

I've tried to time my current for-loop which gives the following times which is far too much.

  user   system  elapsed 
104.72     0.28   118.02 

Questions

  1. Why does this algorithm take so long time to run through? Is there any commands which is bad to have within for-loops etc. etc.?
  2. How can I achieve what I want in a faster way?

Current for-loop

for (i in 1:dim(data)[1]) {
  tmp_data<-data[i,]  #Take the i'th row in data
  score_col<-which(colnames(old_users)==tmp_data$skill) #find old_user column which matched the skill played
  winners_old_data<-old_users[which(old_users$user==tmp_data$win),] #Fetch winner's old scores
  loosers_old_data<-old_users[which(old_users$user==tmp_data$looser),] #Fetch looser's old scores 


  winners_new_score=winners_old_data[score_col]+(32/2)*(1-0+(1/2)*((loosers_old_data[score_col]-winners_old_data[score_col])/200)) #Calculate the winner's new score
  loosers_new_score=loosers_old_data[score_col]+(32/2)*(0-1+(1/2)*((winners_old_data[score_col]-loosers_old_data[score_col])/200)) #Calculate the looser's new score

  old_users[old_users$user==winners_old_data[[1]],score_col]<-winners_new_score #update cell in old_users
  old_users[old_users$user==loosers_old_data[[1]],score_col]<-loosers_new_score #update cell in old_users
      }

Data to play with

https://drive.google.com/file/d/0BxE_CHLUGoS0WlczUkxLM3VtVjQ/edit?usp=sharing

Any help is very much appreciated

Thank you!

//HK

like image 909
user4098307 Avatar asked Nov 01 '22 16:11

user4098307


1 Answers

The data you posted is ridiculously small! To think I had to install something to unrar it...! If you could please post a much larger data, I'll be able to test how useful my suggestion is.

I would recommend you turn the users data into a matrix with ids as rownames and skills as colnames. Why?

  1. You might get a small speed improvement by accessing the data via normal indexing rather than using which( == ) everywhere. Or at least it will make your code a lot more readable.

  2. More importantly, changing values within a matrix are done in-place memory-wise; while with a data.frame, I think your code keeps creating whole new object every time, which must be time consuming.


# read and transform your data
data  <- read.csv("data.txt", header = FALSE)
names(data) <- c("user", "p1", "p2", "skill", "win", "looser", "time")
users <- data.matrix(read.csv("users.txt", header = FALSE, row.names = 1))
colnames(users) <- paste("C", 1:6)

for (i in 1:nrow(data)) {
   game <- data[i,]
   winner.old <- users[game$win,    game$skill]
   looser.old <- users[game$looser, game$skill]
   winner.new <- winner.old + 32/2 * (1 - 0 + (1/2) * (looser.old-winner.old) / 200)
   looser.new <- looser.old + 32/2 * (0 - 1 + (1/2) * (winner.old-looser.old) / 200)
   users[game$win,    game$skill] <- winner.new
   users[game$looser, game$skill] <- looser.new
}

Isn't it a lot easier to read? Hopefully it will be a bit faster as well, please test and let me know. Or provide a larger data set we can play with. Thanks.

like image 123
flodel Avatar answered Nov 15 '22 07:11

flodel