Retrieve the most repeated (x, y) values in two columns in a data frame

Tags:

r

I am storing (x, y) values in a dataframe. I want to return the most frequently appearing (x, y) combination.

Here is an example:

> x = c(1, 1, 2, 3, 4, 5, 6)
> y = c(1, 1, 5, 6, 9, 10, 12)
> xy = data.frame(x, y)
> xy
  x  y
1 1  1
2 1  1
3 2  5
4 3  6
5 4  9
6 5 10
7 6 12

The most common (x, y) value would be (1, 1).

I tried the answer here for a single column. It works for a single column, but does not work for an aggregate of two columns.

> tail(names(sort(table(xy$x))), 1)
[1] "1"
> tail(names(sort(table(xy$x, xy$y))), 1)
NULL

How do I retrieve the most repeated (x, y) values in two columns in a data frame in R?

EDIT: c(1, 2) should be considered distinct from c(2, 1).

383

asked Apr 28 '15 13:04

2 Answers

Not sure how will the desired output should look like, but here's a possible solution

res <- table(do.call(paste, xy))
res[which.max(res)]
# 1 1 
#   2

In order to get the actual values, one could do

res <- do.call(paste, xy) 
xy[which.max(ave(seq(res), res, FUN = length)), ]
#   x y
# 1 1 1

134

answered Sep 23 '22 11:09

David Arenburg

(Despite all the plus votes, a hybrid of @DavidArenburg and my approaches

res = do.call("paste", c(xy, sep="\r"))
which.max(tabulate(match(res, res)))

might be simple and effective.)

Maybe it seems a little round-about, but a first step is to transform the possibly arbitrary values in the columns of xy to integers ranging from 1 to the number of unique values in the column

x = match(xy[[1]], unique(xy[[1]]))
y = match(xy[[2]], unique(xy[[2]]))

Then encode the combination of columns to unique values

v = x + (max(x) - 1L) * y

Indexing minimizes the range of values under consideration, and encoding reduces a two-dimensional problem to a single dimension. These steps reduce the space required of any tabulation (as with table() in other answers) to the minimum, without creating character vectors.

If one wanted to most common occurrence in a single dimension, then one could index and tabulate v

tbl = tabulate(match(v, v))

and find the index of the first occurrence of the maximum value(s), e.g.,

df[which.max(tbl),]

Here's a function to do the magic

whichpairmax <- function(x, y) {
    x = match(x, unique(x)); y = match(y, unique(y))
    v = x + (max(x) - 1L) * y
    which.max(tabulate(match(v, v)))
}

and a couple of tests

> set.seed(123)
> xy[whichpairmax(xy[[1]], xy[[2]]),]
  x y
1 1 1
> xy1 = xy[sample(nrow(xy)),]
> xy1[whichpairmax(xy1[[1]], xy1[[2]]),]
  x y
1 1 1
> xy1
  x  y
3 2  5
5 4  9
7 6 12
4 3  6
6 5 10
1 1  1
2 1  1

For an arbitrary data.frame

whichdfmax <- function(df) {
    v = integer(nrow(df))
    for (col in df) {
        col = match(col, unique(col))
        v = col + (max(col) - 1L) * match(v, unique(v))
    }
    which.max(tabulate(match(v, v)))
}

answered Sep 22 '22 11:09

Martin Morgan

Related questions
                            
                                using leaflet library to output multiple popup values
                            
                                "RTextTools" create_matrix got an error
                            
                                Improving model training speed in caret (R)
                            
                                Interpretation of ordered and non-ordered factors, vs. numerical predictors in model summary
                            
                                R Extract day from datetime
                            
                                dim(X) must have a positive length when applying function in data frame
                            
                                How to remove duplicated (by name) column in data.tables in R?
                            
                                Conditionally selecting columns in dplyr where certain proportion of values is NA
                            
                                How to select last N observation from each group in dplyr dataframe?
                            
                                How to upload a file to a server via FTP using R?
                            
                                How to iterate over file names in a R script?
                            
                                R 3.0.0 update has left loads of 2.x packages incompatible
                            
                                How to determine if a string "ends with" another string in R?
                            
                                Filling missing levels
                            
                                problem saving pdf file in R with ggplot2
                            
                                Write different data frame in one .csv file with R
                            
                                How to prevent functions polluting global namespace?
                            
                                Print with syntax color in R-Studio
                            
                                Variables Overview with xtable in R
                            
                                Wrapping / bending a text around a circle in plot (R)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

Retrieve the most repeated (x, y) values in two columns in a data frame

Tags:

dataframe

r

user4605941

People also ask

2 Answers

David Arenburg

Martin Morgan

Recent Activity

Donate For Us