Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Extract rows where value appears in any of multiple columns

Let' say I have two data.frames

name_df = read.table(text = "player_name
a
b
c
d
e
f
g", header = T)

game_df = read.table(text = "game_id winner_name loser_name
1 a b
2 b a
3 a c
4 a d
5 b c
6 c d
7 d e
8 e f
9 f a
10 g f
11 g a
12 f e
13 a d", header = T)

name_df contains a unique list of all the winner_name or loser_name values in game_df. I want to create a new data.frame that has, for each person in the name_df a row if a given name (e.g. a) appears in either the winner_name or loser_name column

So I essentially want to merge game_df with name_df, but the key column (name) can appear in either winner_name or loser_name.

So, for just a and b the final output would look something like:

final_df = read.table(text = "player_name game_id winner_name loser_name
a 1 a b
a 2 b a
a 3 a c
a 4 a d
a 9 f a
a 11 g a
a 13 a d
b 1 a b
b 2 b a
b 5 b c", header = T)
like image 969
Parseltongue Avatar asked Apr 24 '21 22:04

Parseltongue


People also ask

How to find rows with specific values in R?

You can use the following basic syntax to find the rows of a data frame in R in which a certain value appears in any of the columns: library(dplyr) df %>% filter_all(any_vars(. %in% c('value1', 'value2', ...)))

Which of the following is an arrangement of data in rows and columns?

Tabulation is the planned or structured statistical data arrangement in rows or columns.

How do I extract all rows from a range in Excel?

Extract all rows from a range that meet criteria in one column [Array formula] The array formula in cell B20 extracts records where column E equals either "South" or "East". To enter an array formula, type the formula in a cell then press and hold CTRL + SHIFT simultaneously, now press Enter once.

How to select all rows that contain the value 25 in Dataframe?

The following syntax shows how to select all rows of the DataFrame that contain the value 25 in any of the columns: df [df.isin( [25]).any(axis=1)] points assists rebounds 0 25 5 11 The following syntax shows how to select all rows of the DataFrame that contain the values 25, 9, or 6 in any of the columns:

How to extract Records where column E equals E in Excel?

The array formula in cell B20 extracts records where column E equals either "South" or "East". To enter an array formula, type the formula in a cell then press and hold CTRL + SHIFT simultaneously, now press Enter once. Release all keys.

How to select all rows that contain the character G in Dataframe?

The following syntax shows how to select all rows of the DataFrame that contain the character G in any of the columns: df [df.isin( ['G']).any(axis=1)] points assists position 0 25 5 G 1 12 7 G


Video Answer


5 Answers

We can loop over the elements in 'name_df' for 'player_name', filter the rows from 'game_df' for either the 'winner_name' or 'loser_name'

library(dplyr)
library(purrr)
map_dfr(setNames(name_df$player_name, name_df$player_name), 
   ~ game_df %>%
        filter(winner_name %in% .x|loser_name %in% .x), .id = 'player_name')

Or if there are many columns, use if_any

map_dfr(setNames(name_df$player_name, name_df$player_name), 
  ~ {
     nm1 <- .x
     game_df %>%
       filter(if_any(c(winner_name, loser_name), ~ . %in%  nm1))
      }, .id = 'player_name')
like image 94
akrun Avatar answered Oct 25 '22 14:10

akrun


Dedicated to our teacher and mentor dear @akrun

I think we can also make use of the add_row() function you first taught me the other day. Unbelievable!!!

library(dplyr)
library(purrr)
library(tibble)

game_df %>%
  rowwise() %>%
  mutate(player_name = winner_name) %>%
  group_split(game_id) %>%
  map_dfr(~ add_row(.x, game_id = .x$game_id, winner_name = .x$winner_name, 
                    loser_name = .x$loser_name, player_name = .x$loser_name)) %>%
  arrange(player_name) %>%
  relocate(player_name)


# A tibble: 26 x 4
   player_name game_id winner_name loser_name
   <chr>         <int> <chr>       <chr>     
 1 a                 1 a           b         
 2 a                 2 b           a         
 3 a                 3 a           c         
 4 a                 4 a           d         
 5 a                 9 f           a         
 6 a                11 g           a         
 7 a                13 a           d         
 8 b                 1 a           b         
 9 b                 2 b           a         
10 b                 5 b           c         
# ... with 16 more rows

like image 33
Anoushiravan R Avatar answered Oct 25 '22 13:10

Anoushiravan R


This can be directly expressed in SQL:

library(sqldf)

sqldf("select * 
  from name_df 
  left join game_df on winner_name = player_name or loser_name = player_name")
like image 36
G. Grothendieck Avatar answered Oct 25 '22 12:10

G. Grothendieck


Without using purrr. I think this is appropriate use case of tidyr::unite with argument remove = F where we can first unite the winners' and losers' names and then use tidyr::separate_rows to split new column into rows.

library(tidyr)
library(dplyr)

game_df %>% unite(Player_name, winner_name, loser_name, remove = F, sep = ', ') %>%
  separate_rows(Player_name) %>%
  relocate(Player_name) %>%
  arrange(Player_name)

# A tibble: 26 x 4
   Player_name game_id winner_name loser_name
   <chr>         <int> <chr>       <chr>     
 1 a                 1 a           b         
 2 a                 2 b           a         
 3 a                 3 a           c         
 4 a                 4 a           d         
 5 a                 9 f           a         
 6 a                11 g           a         
 7 a                13 a           d         
 8 b                 1 a           b         
 9 b                 2 b           a         
10 b                 5 b           c         
# ... with 16 more rows
like image 39
AnilGoyal Avatar answered Oct 25 '22 14:10

AnilGoyal


A Base R approach :

result <- do.call(rbind, lapply(name_df$player_name, function(x) 
                   cbind(plaername = x, 
                   subset(game_df, winner_name == x | loser_name == x))))

rownames(result) <- NULL

result
#   playername game_id winner_name loser_name
#1           a       1           a          b
#2           a       2           b          a
#3           a       3           a          c
#4           a       4           a          d
#5           a       9           f          a
#6           a      11           g          a
#7           a      13           a          d
#8           b       1           a          b
#...
#...
like image 2
Ronak Shah Avatar answered Oct 25 '22 12:10

Ronak Shah