Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Compare two columns to return equal values in both

Tags:

r

I have four ID columns in my table (called MGIT): Ext_ID_1 through _4. Sometimes the same number is present in different rows in them and I need to filter these rows for further analysis, example:

Starting state:

Ext_ID      Ext_ID_4

1111          2222
3333          4444
5555          1111
6666          7777
8888          9999
9999          1010

Desired filtered outcome:

Ext_ID      Ext_ID_4

1111          2222
5555          1111
8888          9999
9999          1010

I would only like to compare them two at a time for clearer management.

Through previous questions in StackOverflow I have found the following code:

Dupl = MGIT[,c('Ext_ID','Ext_ID_4')]
Result <- MGIT[duplicated(Dupl) | duplicated(Dupl, fromLast=TRUE),]

But it's giving back duplicated only values inside the columns (actually, only inside Ext_ID, but I believe ID_4 doesn't have duplicated values within itself).

I am a beginner at programming and know nothing about the language.

like image 304
Barbara Perez de Araújo Avatar asked Feb 05 '26 05:02

Barbara Perez de Araújo


1 Answers

I assume you want this to work for all your four columns, and not just the two in your example.

Here is a solution that will work no matter how many "Ext_ID" columns you have:

library(dplyr)

# Recreate the data from your example
df <- tibble::tribble(
  ~Ext_ID,      ~Ext_ID_4,
  1111,          2222,
  3333,          4444,
  5555,          1111,
  6666,          7777,
  8888,          9999,
  9999,          1010
)

# The actual code you need - just replace `df` with the name of your table
df %>% 
  filter(
    if_any(
      starts_with("Ext_ID"),
      ~ .x %>% purrr::map_lgl(~sum(df == .x) > 1)
    )
  )

# The output:
#> # A tibble: 4 x 2
#>   Ext_ID Ext_ID_4
#>    <dbl>    <dbl>
#> 1   1111     2222
#> 2   5555     1111
#> 3   8888     9999
#> 4   9999     1010

Explaination:

  • ~ .x %>% purrr::map_lgl(~sum(df == .x) > 1) checks whether each value in a column shows up more than once in the dataframe.
  • starts_with("Ext_ID") makes sure that this is done for all columns starting with "Ext_ID"
  • if_any() makes filter() keep rows where there are at least 1 duplicate value in the df for any of the columns.
like image 123
jpiversen Avatar answered Feb 06 '26 19:02

jpiversen



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!