Let' say I have two data.frames
name_df = read.table(text = "player_name
a
b
c
d
e
f
g", header = T)
game_df = read.table(text = "game_id winner_name loser_name
1 a b
2 b a
3 a c
4 a d
5 b c
6 c d
7 d e
8 e f
9 f a
10 g f
11 g a
12 f e
13 a d", header = T)
name_df
contains a unique list of all the winner_name
or loser_name
values in game_df
. I want to create a new data.frame that has, for each person in the name_df
a row if a given name (e.g. a
) appears in either the winner_name
or loser_name
column
So I essentially want to merge game_df
with name_df
, but the key column (name
) can appear in either winner_name
or loser_name
.
So, for just a
and b
the final output would look something like:
final_df = read.table(text = "player_name game_id winner_name loser_name
a 1 a b
a 2 b a
a 3 a c
a 4 a d
a 9 f a
a 11 g a
a 13 a d
b 1 a b
b 2 b a
b 5 b c", header = T)
You can use the following basic syntax to find the rows of a data frame in R in which a certain value appears in any of the columns: library(dplyr) df %>% filter_all(any_vars(. %in% c('value1', 'value2', ...)))
Tabulation is the planned or structured statistical data arrangement in rows or columns.
Extract all rows from a range that meet criteria in one column [Array formula] The array formula in cell B20 extracts records where column E equals either "South" or "East". To enter an array formula, type the formula in a cell then press and hold CTRL + SHIFT simultaneously, now press Enter once.
The following syntax shows how to select all rows of the DataFrame that contain the value 25 in any of the columns: df [df.isin( [25]).any(axis=1)] points assists rebounds 0 25 5 11 The following syntax shows how to select all rows of the DataFrame that contain the values 25, 9, or 6 in any of the columns:
The array formula in cell B20 extracts records where column E equals either "South" or "East". To enter an array formula, type the formula in a cell then press and hold CTRL + SHIFT simultaneously, now press Enter once. Release all keys.
The following syntax shows how to select all rows of the DataFrame that contain the character G in any of the columns: df [df.isin( ['G']).any(axis=1)] points assists position 0 25 5 G 1 12 7 G
We can loop over the elements in 'name_df' for 'player_name', filter
the rows from 'game_df' for either the 'winner_name' or 'loser_name'
library(dplyr)
library(purrr)
map_dfr(setNames(name_df$player_name, name_df$player_name),
~ game_df %>%
filter(winner_name %in% .x|loser_name %in% .x), .id = 'player_name')
Or if there are many columns, use if_any
map_dfr(setNames(name_df$player_name, name_df$player_name),
~ {
nm1 <- .x
game_df %>%
filter(if_any(c(winner_name, loser_name), ~ . %in% nm1))
}, .id = 'player_name')
Dedicated to our teacher and mentor dear @akrun
I think we can also make use of the add_row()
function you first taught me the other day. Unbelievable!!!
library(dplyr)
library(purrr)
library(tibble)
game_df %>%
rowwise() %>%
mutate(player_name = winner_name) %>%
group_split(game_id) %>%
map_dfr(~ add_row(.x, game_id = .x$game_id, winner_name = .x$winner_name,
loser_name = .x$loser_name, player_name = .x$loser_name)) %>%
arrange(player_name) %>%
relocate(player_name)
# A tibble: 26 x 4
player_name game_id winner_name loser_name
<chr> <int> <chr> <chr>
1 a 1 a b
2 a 2 b a
3 a 3 a c
4 a 4 a d
5 a 9 f a
6 a 11 g a
7 a 13 a d
8 b 1 a b
9 b 2 b a
10 b 5 b c
# ... with 16 more rows
This can be directly expressed in SQL:
library(sqldf)
sqldf("select *
from name_df
left join game_df on winner_name = player_name or loser_name = player_name")
Without using purrr
. I think this is appropriate use case of tidyr::unite
with argument remove = F
where we can first unite the winners' and losers' names and then use tidyr::separate_rows
to split new column into rows.
library(tidyr)
library(dplyr)
game_df %>% unite(Player_name, winner_name, loser_name, remove = F, sep = ', ') %>%
separate_rows(Player_name) %>%
relocate(Player_name) %>%
arrange(Player_name)
# A tibble: 26 x 4
Player_name game_id winner_name loser_name
<chr> <int> <chr> <chr>
1 a 1 a b
2 a 2 b a
3 a 3 a c
4 a 4 a d
5 a 9 f a
6 a 11 g a
7 a 13 a d
8 b 1 a b
9 b 2 b a
10 b 5 b c
# ... with 16 more rows
A Base R approach :
result <- do.call(rbind, lapply(name_df$player_name, function(x)
cbind(plaername = x,
subset(game_df, winner_name == x | loser_name == x))))
rownames(result) <- NULL
result
# playername game_id winner_name loser_name
#1 a 1 a b
#2 a 2 b a
#3 a 3 a c
#4 a 4 a d
#5 a 9 f a
#6 a 11 g a
#7 a 13 a d
#8 b 1 a b
#...
#...
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With