Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

separate data in 2 groups with elements of each pair in separate groups

Tags:

dataframe

r

I have a dataset consisting of two columns player1 and player2 for a group of n players. each 2 players play with each other twice (once i as player1 and j as player2 and once i as player2 and j as player1)

I want to separate my data so that I have a games1 dataframe which includes all the games where it's the two players' first game and a dataframe game2 which includes all their second games (obviously each sub dataframe is half the size of my original dataframe)

I've considered iterating with a for loop over all the rows and defining a flag to determine whether it's the first game two players are playing or not. I was just wondering if there was an easier/faster way.

I have a data.frame()

# reproducible exmaple
df1 <- read.table(text = "player1  player2
1:         1        2
2:         2        3
3:         3        2
4:         1        3
5:         2        1
6:         3        1", header = TRUE)

I need:

data.frame()
     player1  player2
1:         1        2
2:         2        3
3:         1        3

and

1:         3        2
2:         2        1
3:         3        1
like image 330
Alaleh Avatar asked Feb 26 '18 12:02

Alaleh


2 Answers

First you have to identify the two players. Then you can use the combination for grouping:

# reproducible exmaple
df1 <- read.table(text = "player1  player2
1:         1        2
2:         2        3
3:         3        2
4:         1        3
5:         2        1
6:         3        1", header = TRUE)
df1$players <- with(df1, 
        ifelse(player1 < player2, paste(player1, player2, sep='.'), paste(player2, player1, sep='.')))
df1$game <- ave(df1$players, df1$players, FUN=function(x) c(1,2))
# > df1
#    player1 player2 players game
# 1:       1       2     1.2    1
# 2:       2       3     2.3    1
# 3:       3       2     2.3    2
# 4:       1       3     1.3    1
# 5:       2       1     1.2    2
# 6:       3       1     1.3    2

Here is a solution with data.table:

library("data.table")
# reproducible exmaple
df1 <- read.table(text = "player1  player2
1:         1        2
2:         2        3
3:         3        2
4:         1        3
5:         2        1
6:         3        1", header = TRUE)
setDT(df1)
df1[, players:=ifelse(player1 < player2, paste(player1, player2, sep='.'), paste(player2, player1, sep='.'))]
df1[, game:=c(1,2), players][]

Using the function rowid() this can be shorten to (thx to @Frank):

df1[, game := rowid(paste(pmin(player1, player2), pmax(player1, player2)))]

The splitting of the groups can be done in both variants by the function split() from base R:

split(df1, by="game", keep = FALSE)

The result will be a list of two data.table-objects.

like image 44
jogo Avatar answered Nov 15 '22 10:11

jogo


A slightly ugly solution can be to sort rowwise, and get the two groups by duplicated(...) and duplicate(..., fromLast = TRUE), i.e.

d1 <- data.frame(t(apply(df1, 1, function(i) sort(i, decreasing = TRUE))))

df1[!duplicated(d1),]
#   player1 player2
#1:       1       2
#2:       2       3
#4:       1       3

#AND

df1[!duplicated(d1, fromLast = TRUE),]
#   player1 player2
#3:       3       2
#5:       2       1
#6:       3       1

Since it is not recommended to keep a lot of objects in your global environment, you can add them in a list, i.e.

list1 <- list(df1[!duplicated(d1),], df1[!duplicated(d1, fromLast = TRUE),])
like image 188
Sotos Avatar answered Nov 15 '22 10:11

Sotos