Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to order a list by a custom function, discarding duplicates?

Tags:

sorting

r

I have this list :

thresholds <- list(
     list(color="red", value=100),
     list(color="blue", value=50),
     list(color="orange", value=100),
     list(color="green", value=1),
     list(color="orange", value=50)
)

I want to order it by the "value" field of each element and discard duplicates so that no two elements have the same "value" field in the resulting list (the element that gets picked when there's a tie doesn't matter).

sort and unique don't work with complex lists and don't permit a custom ordering. How to achieve the desired result?

like image 824
lgeorget Avatar asked Nov 08 '19 00:11

lgeorget


2 Answers

First of all, in this particular case, the actual vector to order is:

values <- sapply(thresholds, function (t) t$value)
# values == c(100, 50, 100, 1, 50)

You can adjust the function inside sapply for your needs (for instance, do the appropriate casting depending on whether you want to sort in numeric or alphabetical order, etc.).

From this point, if we were to keep the duplicates, the answer would simply be:

thresholds[order(values)]

order returns for each element in "values" its rank, i.e. its position if the vector were sorted. Here order(values) is 4 2 5 1 3. Then, thresholds[order(values)] returns the elements of thresholds identified by these indices, producing 1 50 50 100 100.

However, since we want to remove duplicates, it cannot be as simple as that. unique won't work on thresholds and if we apply it to values, it will lose the correspondence with the indices in the original list.

The solution is to use another function, namely duplicated. When applied on a vector, duplicated returns a vector of booleans, indicating for each element, if it already exists in the vector at an earlier position. For instance, duplicated(values) would return FALSE FALSE TRUE FALSE TRUE. This vector is the filter on duplicated elements we need here.

The solution is therefore:

ordering <- order(values)
nodups <- ordering[!duplicated(values)]
thresholds[nodups]

or as a one-liner:

thresholds[order(values)[!duplicated(values)]]
like image 128
lgeorget Avatar answered Nov 03 '22 19:11

lgeorget


Adding another alternative, for completeness, regarding the "custom sort"/"custom unique" part of the question. By defining methods for certain functions (as seen in ?xtfrm) we can apply custom sort and unique functions to any list (or other object).

First, a "class" attribute needs to be added:

class(thresholds) = "thresholds"

Then, define the necessary custom functions:

"==.thresholds" = function(x, y) return(x[[1]][["value"]] == y[[1]][["value"]])
">.thresholds" = function(x, y) return(x[[1]][["value"]] > y[[1]][["value"]])
"[.thresholds" = function(x, i) return(structure(.subset(x, i), class = class(x)))
is.na.thresholds = function(x) return(is.na(x[[1]][["value"]]))

Now, we can apply sort:

sort(thresholds)

Finally, add a custom unique function:

duplicated.thresholds = function(x, ...) return(duplicated(sapply(x, function(elt) elt[["value"]])))
unique.thresholds = function(x, ...) return(x[!duplicated((x))])

And:

sort(unique(thresholds))

(Similar answers and more information here and here)

like image 2
alexis_laz Avatar answered Nov 03 '22 19:11

alexis_laz