Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort non-numerical objects in Java when only some relationships are known?

Tags:

java

sorting

I have a list of items:

[foo, bar, baz, boo, abc, xyz]

Some of these items want to be sorted in a specific order:

foo after abc
xyz before baz

The order of the other items does not matter as long as all given rules are respected.

Here are some of the possible sorted orders:

[abc, foo, xyz, baz, bar, boo]
[abc, xyz, foo, baz, bar, boo]
[abc, foo, bar, boo, xyz, baz]
[xyz, baz, bar, boo, abc, foo]

Using a Comparator does not seem to work, as it is possible to engineer a list that will cause it to fail. For instance, if our compare method looks like this:

list.sort((a, b) -> {
    if (a.isAfter(b)) {
        return 1;
    } else if (a.isBefore(b)) {
        return -1;
    }
    return 0;
});

And we run it against [foo, bar, baz, boo, abc, xyz], the method will do as follows:

Comparing 'bar' to 'foo': no rule present -> 0
Comparing 'baz' to 'bar': no rule present -> 0
Comparing 'boo' to 'baz': no rule present -> 0
Comparing 'abc' to 'boo': no rule present -> 0
Comparing 'xyz' to 'abc': no rule present -> 0

The Comparator will run, but it will just spit out the same list you started with. It seems that for a Comparator to work correctly, you need to know the relationship between any two items in the list, not just some of them.

Knowing this, one solution would be to move all elements that have rules to a separate list, sort that one, and then merge it with the rest of the elements. This way we do know the relationship between all the items we are comparing. However, for this to work, you have to create separate lists for every single rule. Otherwise there is a likelihood that you could run into the exact same problem again:

[foo, bar, baz, boo, abc, xyz] // original
[foo, baz, abc, xyz] // elements with rules
[foo, baz, abc, xyz] // elements with rules **after comparator**
[foo, baz, abc, xyz, bar, boo] // merged with the rest, rules not satisfied

Creating lists for every single rule that might appear is not very elegant. Is there another sorter I can use that accommodates for the kind of behavior I am looking for?

like image 292
otoomey Avatar asked Nov 30 '25 10:11

otoomey


2 Answers

Your best best is to "compile" all of the rules into a single list. For example, the two rules mentioned above would generate this list:

["abc", "foo", "xyz", "baz"]

(or ["xyz", "baz", "abc", "foo"]. You'll get different answers but the rules will still be followed).

You will always be able to do this unless there's a cycle of rules, in which case they are impossible to follow. ("abc goes before def, def goes before ghi, ghi goes before abc" is an example of an impossible ruleset).

But if they aren't impossible, then you can compile them into a list — basically all of the named terms in rank order. Your comparator is just the position in that list, and a negative number if the item is not in the list.

With Java 8/9 goodness, you can write that comparator easily like so:

List<String> rules = List.of("abc", "foo", "xyz", "baz");
Comparator<String> comparator = Comparator.comparing((String s) -> rules.indexOf(s));

And then you're off to the races. This comparator sorts the items by their index using a key extraction function — basically a function that turns the value into another value, and then sorts by that value. Since list.indexOf() returns -1 for items not mentioned in the rules, and zero or higher for items mentioned in the rules, non-mentioned items will always go at the front, followed by items mentioned in the rules in rule order.

(If you'd prefer items not mentioned in the rules to go at the end, then your key extractor function needs to use contains, and return Integer.MAX_VALUE for items not in the list.)

Since Java's sorting algorithm TimSort is a stable sorting algorithm, all of the values with indexes of -1 will be returned in the same order they were in before the list was sorted.

Update: How to "compile" the rules into a list

This can be done by taking advantage of the stable sort algorithm. Add each item mentioned in a rule into a list, and then sort that list once by each individual rule in isolation. For example: the rule "foo after abc" would be a key extractor function that returns 0 for abc, 1 for foo, and Integer.MAX_VALUE for everything else.

Once you've sorted the list once for each rule, you need to check each rule again in a single pass to ensure that all them still hold. (If any don't, you have an impossible ruleset.)

like image 123
Sean Reilly Avatar answered Dec 02 '25 00:12

Sean Reilly


There is multiple solutions for this type of problem.

One of the solutions would be to do the sort manually : you iterate over the array and when you see a foo, you search in the remaining array for all abc and place them behind (or in the reverse order : when you see a abc, you search for all foo in the already passed array and place them ahead).

Another solution would be to do multiple sorts (one for every rule, in this case 2), and each time create an array containing pairs [value, number] where the number depends on the value from the original array. For the first rule, we could have :

  • foo => 1
  • abc => 0
  • all other values => the last used value, or 0.

So the array [foo, bar, baz, boo, abc, xyz] will be translated into [(foo,1), (bar,1), (baz,1), (boo,1), (abc,0), (xyz,0)]. When we sort it using the numbers in the pairs, we get the following array : [(abc,0), (xyz,0), (foo,1), (bar,1), (baz,1), (boo,1)]. which is sorted.

Now if we apply the second rule (xyz=>0, baz=>1), we get the following array : [(abc,0), (xyz,0), (foo,0), (bar,0), (baz,1), (boo,1)]. You now have a sorted array.

You can improve this by using Tuples of [number of rules]+1 elements and assign all the values the first time, and apply the sort function once for every rule, choosing the element of the tuple to sort on each time.

depending on the number of rules and the size of the array, the first method can be better than the second.

If you have a lot of rules and a small array, I think I would prefer the first method. On the contrary, if you have a few rules and a big array, I would suggest the second method.

the reason for this choice is simple : If you have big array, the second method will rely on the sort function integrated in the language, which is faster. On the contrary, if you have a lot of rules, the second method will imply calling the sort function a lot of times, where the first method will cost about the same time no matter the number of rules

like image 33
Loïc France Avatar answered Dec 02 '25 01:12

Loïc France



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!