Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort a data.table based on a set of inequality constraints?

I have a set of "x < y" inequality constraints and I would like to sort the rows of a data.table based on these.

For example,

library(data.table)
set.seed(0)
ineqs <- unique(data.table(
  X = sample(letters, 10, replace = T),
  Rel = "<",
  Y = sample(letters, 10, replace = T)
))
ineqs
    X Rel Y
 1: x   < b
 2: g   < f
 3: j   < e
 4: o   < r
 5: x   < j
 6: f   < u
 7: x   < m
 8: y   < s
 9: r   < z
10: q   < j

So, if I start with a table of sorted letters,

dt <- data.table(Foo = letters)
    Foo
 1:   a
 2:   b
 3:   c
---    
24:   x
25:   y
26:   z

How can I adjust the row order to satisfy my constraints? Also, I am certain that my constraints are valid (i.e. none of the constraints contradict each other).

like image 731
Ben Avatar asked Dec 17 '22 20:12

Ben


2 Answers

library(igraph)
g = ineqs[, graph_from_edgelist(cbind(X,Y), directed=TRUE)]
o = names(topo_sort(g))

dt[, v := factor(Foo, levels = o, ordered=TRUE)]
dt[order(v)]


    Foo    v
 1:   x    x
 2:   g    g
 3:   o    o
 4:   y    y
 5:   q    q
 6:   b    b
 7:   m    m
 8:   f    f
 9:   r    r
10:   s    s
11:   j    j
12:   u    u
13:   z    z
14:   e    e
15:   a <NA>
16:   c <NA>
17:   d <NA>
18:   h <NA>
19:   i <NA>
20:   k <NA>
21:   l <NA>
22:   n <NA>
23:   p <NA>
24:   t <NA>
25:   v <NA>
26:   w <NA>
    Foo    v

All of the terms that aren't in ineqs are sorted to the end.

If the graph of your relation has cycles, you should get a warning in topo_sort. This tells you your task is not well defined for some terms in ineqs.

like image 51
Frank Avatar answered May 27 '23 03:05

Frank


Perhaps I misunderstood but this is not a trivial sort, and there doesn't necessarily exist one unique order.

Let me give you an example. Consider the conditions

    X Rel Y
1: x   < b
2: g   < f

Various orders are conceivable

        x < g < f < b
g     < x     <     b   <   f
g     < x     < f < b
g < f < x         < b
        x < g <     b     < f
        x <         b < g < f

all of which satisfy the conditions laid out in the first two lines.


I was interested in seeing how an exhaustive & crude implementation would do, where we pre-calculate all possible permutations and then eliminate those that do not fulfil the pairwise conditions.

To illustrate, we will use 4 letters only and the first two lines of the pairwise condition data.

Here are my results:

  1. To start, we define the four letters and calculate all permutations using gtools::permutations.

    char <- c("b", "f", "g", "x")
    
    library(gtools)
    perm <- as.data.frame(permutations(length(char), length(char), char))
    

    There are 24 possible permutations.

  2. We now read in the pairwise condition data

    df <- read.table(text =
        "X Rel Y
    x   < b
    g   < f", header = T)
    
    # Convert factors to character vectors
    df[] <- sapply(df, as.character)
    
  3. We now loop throw the permutations and the pairwise conditions and flag those rows in the permutation data that do not satisfy any of the pairwise conditions.

    rmv <- c()
    for (i in 1:nrow(perm)) {
        # Here we loop throw all possible permutations and eliminate those that
        # do not fulfil the pairwise conditions
        for (j in 1:nrow(df)) {
            # Here we loop throw the pairwise conditions
            cond <- eval(parse(text = sprintf("`%s`", df[j, "Rel"])))(
                which(perm[i, ] == df[j, "X"]),
                which(perm[i, ] == df[j, "Y"]))
            if (cond == FALSE) {
                rmv <- c(rmv, i)
                break
            }
        }
    }
    
  4. The remaining permutations that satisfy the conditions are then

    perm[-rmv, ]
    #   V1 V2 V3 V4
    #16  g  f  x  b
    #17  g  x  b  f
    #18  g  x  f  b
    #20  x  b  g  f
    #23  x  g  b  f
    #24  x  g  f  b
    
like image 26
Maurits Evers Avatar answered May 27 '23 05:05

Maurits Evers