Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

dplyr mutate a variable by comparing a variable and vectors of different sizes

I have the dataframe of the following type

df <- tibble::tribble(~x,
                      c("A", "B"),
                      c("A", "B", "C"),
                      c("A", "B", "C", "D"),
                      c("A", "B"))

and vectors like these

vec1 <- c("A", "B")
vec2 <- c("A", "B", "C")
vec3 <- c("A", "B", "C", "D")

I want to mutate a variable y that shows which row has which vector. I tried the following, but getting the empty y variable with the warning: "longer object length is not a multiple of shorter object length"

df_new <- df %>%
  mutate(y = case_when(x == vec1 ~ "vec1",
                       x == vec2 ~ "vec2",
                       x == vec2 ~ "vec3"))

The desired output is

df_new <- tibble::tribble(~x,                      ~y,
                          c("A", "B"),             "vec1",
                          c("A", "B", "C"),        "vec2",
                          c("A", "B", "C", "D"),   "vec3",
                          c("A", "B"),             "vec1")
like image 570
Geet Avatar asked Apr 23 '18 20:04

Geet


People also ask

What does the mutate function from the dplyr package do?

mutate() adds new variables and preserves existing ones; transmute() adds new variables and drops existing ones. New variables overwrite existing variables of the same name. Variables can be removed by setting their value to NULL .

Which dplyr function creates new variables?

To create new variables from existing variables, use the case when() function from the dplyr package in R.

Is mutate in dplyr?

The dplyr package is an add-on to R. It includes a host of cool functions for selecting, filtering, grouping, and arranging data. It also includes the mutate function.


2 Answers

A solution using map2_lgl and identical to assess if the vectors are the same.

library(tidyverse)

df_new <- df %>%
  mutate(y = case_when(
    map2_lgl(x, list(vec1), ~identical(.x, .y))  ~"vec1",
    map2_lgl(x, list(vec2), ~identical(.x, .y))  ~"vec2",
    map2_lgl(x, list(vec3), ~identical(.x, .y))  ~"vec3"
  ))
df_new
# # A tibble: 4 x 2
#   x         y    
#   <list>    <chr>
# 1 <chr [2]> vec1 
# 2 <chr [3]> vec2 
# 3 <chr [4]> vec3 
# 4 <chr [2]> vec1 
like image 171
www Avatar answered Oct 09 '22 04:10

www


Here's an alternative that's more programmatic - you don't need to specify each vector explicitly

Data

df <- tibble::tribble(~x,
                      c("A", "B"),
                      c("A", "B", "C"),
                      c("A", "B", "C", "D"),
                      c("A", "B"))

vec1 <- c("A", "B")
vec2 <- c("A", "B", "C")
vec3 <- c("A", "B", "C", "D")

Solution - takes advantage of ls(...) to return relevant vector names using a pattern

vecs <- ls(pattern="vec")
L <- lapply(vecs, get)
names(L) <- vecs
df %>%
  mutate(y = names(L)[match(x, L)])

# A tibble: 4 x 2
  # x         y    
  # <list>    <chr>
# 1 <chr [2]> vec1 
# 2 <chr [3]> vec2 
# 3 <chr [4]> vec3 
# 4 <chr [2]> vec1
like image 41
CPak Avatar answered Oct 09 '22 03:10

CPak