Remove column if it is equal to another column and has certain name part

Question

I have this data frame

df <- data.frame(a.paid = c("A", "A", "B"), a.written = c("A", "A", "B"),
                 b.paid = c(1, 3, 4), b.written = c(1, 3, 4),
                 c.paid = c(1, 3, 5), c.written = c(2, 3, 5),
                 t = c(1, 1, 1))

and I wish to reduce the number of columns by some sort of test. E.g. if the entire column of a.paid is equal to a.written, then remove a.written. But over the entire data frame, such that it includes all cases where the names have paid and written in common.

With dplyr preferebly.

Frankly, I do not know how to solve this problem but to do it "by hand". ChatGPT was in this case not very helpful either, nor Google - but perhaps my wording and searching skills weren't precise enough.

The expected outcome should be this:

 a.paid b.paid c.paid c.written t
1      A      1      1         2 1
2      A      3      3         3 1
3      B      4      5         5 1

Yuriy Saraykin · Accepted Answer

df <- data.frame(a.paid = c("A", "A", "B"), a.written = c("A", "A", "B"),
                 b.paid = c(1, 3, 4), b.written = c(1, 3, 4),
                 c.paid = c(1, 3, 5), c.written = c(2, 3, 5),
                 t = c(1, 1, 1))

library(tidyverse)
slct <- map(df, ~.x) %>% duplicated()

or base

slct <- lapply(df, function(x) x) |> duplicated()

df[!slct]
#>   a.paid b.paid c.paid c.written t
#> 1      A      1      1         2 1
#> 2      A      3      3         3 1
#> 3      B      4      5         5 1

^{Created on 2023-08-31 with reprex v2.0.2}

benson23 · Answer

To make sure we are only comparing pairs of columns, we can make use of across.

library(tidyverse)

target_cols <- 
  df %>% mutate(
    # this make sure we are comparing pairs
    across(ends_with("paid"), ~if (sub("paid", "written", cur_column()) %in% colnames(df)) .x == get(sub("paid", "written", cur_column())) else T), 
    # if paid are all TURE, set written to FALSE
    across(ends_with("written"), ~ifelse(sub("written", "paid", cur_column()) %in% colnames(df), 
                                         ifelse(all(get(sub("written", "paid", cur_column()))), F, T), 
                                         T)), 
    # catch anything that's not in pairs
    across(where(~!is.logical(.x)), ~ifelse(!is.null(.x), T, F))) %>% 
  select(where(~any(.x))) %>% 
  colnames()

# use indexing to get the correct columns
df[,target_cols]
#>   a.paid b.paid c.paid c.written t
#> 1      A      1      1         2 1
#> 2      A      3      3         3 1
#> 3      B      4      5         5 1

Remove column if it is equal to another column and has certain name part

Tags:

dataframe

r

dplyr

CodingCat

2 Answers

Yuriy Saraykin

benson23

Recent Activity

Donate For Us

Remove column if it is equal to another column and has certain name part

Tags:

dataframe

r

dplyr

CodingCat

2 Answers

Yuriy Saraykin

benson23

Related questions

Recent Activity

Donate For Us