Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove column if it is equal to another column and has certain name part

Tags:

dataframe

r

dplyr

I have this data frame

df <- data.frame(a.paid = c("A", "A", "B"), a.written = c("A", "A", "B"),
                 b.paid = c(1, 3, 4), b.written = c(1, 3, 4),
                 c.paid = c(1, 3, 5), c.written = c(2, 3, 5),
                 t = c(1, 1, 1))

and I wish to reduce the number of columns by some sort of test. E.g. if the entire column of a.paid is equal to a.written, then remove a.written. But over the entire data frame, such that it includes all cases where the names have paid and written in common.

With dplyr preferebly.

Frankly, I do not know how to solve this problem but to do it "by hand". ChatGPT was in this case not very helpful either, nor Google - but perhaps my wording and searching skills weren't precise enough.

The expected outcome should be this:

 a.paid b.paid c.paid c.written t
1      A      1      1         2 1
2      A      3      3         3 1
3      B      4      5         5 1
like image 537
CodingCat Avatar asked Dec 06 '25 07:12

CodingCat


2 Answers

df <- data.frame(a.paid = c("A", "A", "B"), a.written = c("A", "A", "B"),
                 b.paid = c(1, 3, 4), b.written = c(1, 3, 4),
                 c.paid = c(1, 3, 5), c.written = c(2, 3, 5),
                 t = c(1, 1, 1))

library(tidyverse)
slct <- map(df, ~.x) %>% duplicated()

or base

slct <- lapply(df, function(x) x) |> duplicated()

df[!slct]
#>   a.paid b.paid c.paid c.written t
#> 1      A      1      1         2 1
#> 2      A      3      3         3 1
#> 3      B      4      5         5 1

Created on 2023-08-31 with reprex v2.0.2

like image 98
Yuriy Saraykin Avatar answered Dec 09 '25 03:12

Yuriy Saraykin


To make sure we are only comparing pairs of columns, we can make use of across.

library(tidyverse)

target_cols <- 
  df %>% mutate(
    # this make sure we are comparing pairs
    across(ends_with("paid"), ~if (sub("paid", "written", cur_column()) %in% colnames(df)) .x == get(sub("paid", "written", cur_column())) else T), 
    # if paid are all TURE, set written to FALSE
    across(ends_with("written"), ~ifelse(sub("written", "paid", cur_column()) %in% colnames(df), 
                                         ifelse(all(get(sub("written", "paid", cur_column()))), F, T), 
                                         T)), 
    # catch anything that's not in pairs
    across(where(~!is.logical(.x)), ~ifelse(!is.null(.x), T, F))) %>% 
  select(where(~any(.x))) %>% 
  colnames()

# use indexing to get the correct columns
df[,target_cols]
#>   a.paid b.paid c.paid c.written t
#> 1      A      1      1         2 1
#> 2      A      3      3         3 1
#> 3      B      4      5         5 1
like image 20
benson23 Avatar answered Dec 09 '25 03:12

benson23