Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

frequently and most recently used items ranking algorithm

I'm searching for an ranking algorithm that will sort items (files, applications, visited websites...) by the amount of usages and also most recent usages.

eg in an application launcher, user types some short prefix of the application name and apps that meets the conditions will be ranked. app A was users favorite app and was used very often, but now he is using app B often and just sometimes uses app A. app A was launched way more times than app B, but B has more usages than A in the last time.

so app B is ranked before app A.

furthermore, if app C wants to outrun app B, it must be used (in the recent time) more often, but for app A to be the first, it doesn't need so much usages, because it is the users favorite app and has more usages in the past then other apps.

I don't know if this is a good explanation of what I want, but I hope that some will understand.

like image 874
microo8 Avatar asked Feb 05 '23 07:02

microo8


1 Answers

I think you can achieve this using an implementation of a cache replacement policy.

These algorithms help computer processors (CPUs) determine which sections ("pages") of main memory (RAM) to keep in a cache (L1, L2 etc) -- which the CPU can access much faster than RAM. But they can be adapted to your problem very easily.

An algorithm which sorts items by most recent usage would be similar to the cache policy LRU which expires/replaces the Least-Recently-Used page when the cache is full.

An algorithm which sorts items by most frequent usage (here "frequent" really means "number of uses") would be similar to the cache policy LFU which expires/replaces the Least-Frequently-Used page when the cache is full.

There are several policies that explicitly or implicitly combine both concepts in the manner you are requesting. Some also involve "time" (either in terms of actual computer clock time or simply an incrementing counter of page requests etc) to obtain a better sense of the "age" or "frequency" of page access (as opposed to simply number of uses).

These become slightly more complicated and harder to implement, especially if you need a very efficient algorithm. But for most uses related to user-interfaces, even an inefficient algorithm should be plenty fast, because the number of items are small and the user will modify the list very infrequently.

One example of an algorithm that might work for you is the LRFU (Least Recently/Frequently Used) policy that directly seeks to combine LRU and LFU to expire pages based on a formula that combines both recency and frequency of use. You can find a reference to this on the same page listed above. You can also see the scholarly article where it is reported.

The article implements it using a combination of a heap and a linked-list data structure, and it contains some pseudo-code for the implementation.

For your uses, you could probably write a much simpler, but less efficient algorithm fairly easily.

For example you could simply store an array of objects with 2 properties:

  1. Value (the thing you care about -- e.g. website or file name etc)
  2. Score (discussed below -- called "CRF" in the article)

Whenever a Value is selected by the user, you modify the list as follows:

  1. Update the Scores of all existing Items in the list by multiplying each Score by a constant weighting factor, USAGE_WEIGHT (a number between 0.5 and 1.0, discussed more below).
  2. Search the list to see if the recenty-selected item already exists. If it does, simply add 1.0 to its existing Score. If it doesn't, then create a new Item with an initial Score of 1.0 and add it to the list. If there is no more room in the list (meaning it already contains the maximum number of MRU items you want to display) then first remove the item in the list with the lowest score (which will always be at the end of the list, due to the next step).
  3. Now re-sort the list by Score in descending order. (Higest scores should appear first in the MRU list).

The USAGE_WEIGHT factor determines how imporant "recency" is compared to "frequency" (aka usage count). A value of 0.5 will cause the list to have a fully-LRU behavior (where only recency matters), while a value of 1.0 will cause the list to have a fully-LFU behavior (where only usage-count matters). An intermediate value, for example 0.9, will cause the list to have a mixture of LRU and LFU behavior, as shown by the example output below.

In each scenario below, the "Values" are letters, and they are added in this order:

A B C B A A D A C D A B D E C B A  

After each addition, I list the letter that was added along with the current MRU list in quotes (e.g. "DBA"). The maximum MRU size is 3. I also list a more detailed representation of the list, showing the Value (Letter) and Score of each item in the form { Letter, Score }.

USAGE_WEIGHT = 1.0 (Fully LFU -- only usage-count matters)

  1. (Added A) "A"      [ { A, 1.0 } ]
  2. (Added B) "AB"     [ { A, 1.0 } { B, 1.0 } ]
  3. (Added C) "ABC"    [ { A, 1.0 } { B, 1.0 } { C, 1.0 } ]
  4. (Added B) "BAC"    [ { B, 2.0 } { A, 1.0 } { C, 1.0 } ]
  5. (Added A) "BAC"    [ { B, 2.0 } { A, 2.0 } { C, 1.0 } ]
  6. (Added A) "ABC"    [ { A, 3.0 } { B, 2.0 } { C, 1.0 } ]
  7. (Added D) "ABD"    [ { A, 3.0 } { B, 2.0 } { D, 1.0 } ]
  8. (Added A) "ABD"    [ { A, 4.0 } { B, 2.0 } { D, 1.0 } ]
  9. (Added C) "ABC"    [ { A, 4.0 } { B, 2.0 } { C, 1.0 } ]
 10. (Added D) "ABD"    [ { A, 4.0 } { B, 2.0 } { D, 1.0 } ]
 11. (Added A) "ABD"    [ { A, 5.0 } { B, 2.0 } { D, 1.0 } ]
 12. (Added B) "ABD"    [ { A, 5.0 } { B, 3.0 } { D, 1.0 } ]
 13. (Added D) "ABD"    [ { A, 5.0 } { B, 3.0 } { D, 2.0 } ]
 14. (Added E) "ABE"    [ { A, 5.0 } { B, 3.0 } { E, 1.0 } ]
 15. (Added C) "ABC"    [ { A, 5.0 } { B, 3.0 } { C, 1.0 } ]
 16. (Added B) "ABC"    [ { A, 5.0 } { B, 4.0 } { C, 1.0 } ]
 17. (Added A) "ABC"    [ { A, 6.0 } { B, 4.0 } { C, 1.0 } ]

USAGE_WEIGHT = 0.5 (Fully LRU -- only recency matters)

  1. (Added A) "A"      [ { A, 1.0 } ]
  2. (Added B) "BA"     [ { B, 1.0 } { A, 0.5 } ]
  3. (Added C) "CBA"    [ { C, 1.0 } { B, 0.5 } { A, 0.25 } ]
  4. (Added B) "BCA"    [ { B, 1.25 } { C, 0.5 } { A, 0.125 } ]
  5. (Added A) "ABC"    [ { A, 1.0625 } { B, 0.625 } { C, 0.25 } ]
  6. (Added A) "ABC"    [ { A, 1.5313 } { B, 0.3125 } { C, 0.125 } ]
  7. (Added D) "DAB"    [ { D, 1.0 } { A, 0.7656 } { B, 0.1563 } ]
  8. (Added A) "ADB"    [ { A, 1.3828 } { D, 0.5 } { B, 0.0781 } ]
  9. (Added C) "CAD"    [ { C, 1.0 } { A, 0.6914 } { D, 0.25 } ]
 10. (Added D) "DCA"    [ { D, 1.125 } { C, 0.5 } { A, 0.3457 } ]
 11. (Added A) "ADC"    [ { A, 1.1729 } { D, 0.5625 } { C, 0.25 } ]
 12. (Added B) "BAD"    [ { B, 1.0 } { A, 0.5864 } { D, 0.2813 } ]
 13. (Added D) "DBA"    [ { D, 1.1406 } { B, 0.5 } { A, 0.2932 } ]
 14. (Added E) "EDB"    [ { E, 1.0 } { D, 0.5703 } { B, 0.25 } ]
 15. (Added C) "CED"    [ { C, 1.0 } { E, 0.5 } { D, 0.2852 } ]
 16. (Added B) "BCE"    [ { B, 1.0 } { C, 0.5 } { E, 0.25 } ]
 17. (Added A) "ABC"    [ { A, 1.0 } { B, 0.5 } { C, 0.25 } ]

USAGE_WEIGHT = 0.9 (LRFU -- Mixture of LRU and LFU)

  1. (Added A) "A"  [ { A, 1.0 } ]
  2. (Added B) "BA" [ { B, 1.0 } { A, 0.9 } ]
  3. (Added C) "CBA"    [ { C, 1.0 } { B, 0.9 } { A, 0.81 } ]
  4. (Added B) "BCA"    [ { B, 1.81 } { C, 0.9 } { A, 0.729 } ]
  5. (Added A) "ABC"    [ { A, 1.6561 } { B, 1.629 } { C, 0.81 } ]
  6. (Added A) "ABC"    [ { A, 2.4905 } { B, 1.4661 } { C, 0.729 } ]
  7. (Added D) "ABD"    [ { A, 2.2414 } { B, 1.3195 } { D, 1.0 } ]
  8. (Added A) "ABD"    [ { A, 3.0173 } { B, 1.1875 } { D, 0.9 } ]
  9. (Added C) "ABC"    [ { A, 2.7156 } { B, 1.0688 } { C, 1.0 } ]
 10. (Added D) "ADB"    [ { A, 2.444 } { D, 1.0 } { B, 0.9619 } ]
 11. (Added A) "ADB"    [ { A, 3.1996 } { D, 0.9 } { B, 0.8657 } ]
 12. (Added B) "ABD"    [ { A, 2.8796 } { B, 1.7791 } { D, 0.81 } ]
 13. (Added D) "ADB"    [ { A, 2.5917 } { D, 1.729 } { B, 1.6012 } ]
 14. (Added E) "ADE"    [ { A, 2.3325 } { D, 1.5561 } { E, 1.0 } ]
 15. (Added C) "ADC"    [ { A, 2.0993 } { D, 1.4005 } { C, 1.0 } ]
 16. (Added B) "ADB"    [ { A, 1.8893 } { D, 1.2604 } { B, 1.0 } ]
 17. (Added A) "ADB"    [ { A, 2.7004 } { D, 1.1344 } { B, 0.9 } ]

In the first example (USAGE_WEIGHT=1.0), the Scores of existing items do NOT change when new items are added (this is because we are multiplying by 1.0 on each step). This leads to simply incrementing Scores by 1.0 after each usage, so the Score directly represents the usage-count. Notice that Items are always listed in order of descreasing usage-count.

In the second example (USAGE_WEIGHT=0.5), the Scores of existing items are halved each time an item is added (this is because we are multiplying by 0.5 on each step). This leads to the property that all Scores in the list are lower than the most recently-added item (which receives a Score of 1.0) moreover it is always true that an Item added at step N will always have a greater score than one added in any step before N, regardless of how many times the item was re-added. This is exactly the property that produces the LRU policy.

Finally, when (USAGE_WEIGHT=0.9) we can see how the two policies are mixed. This third example starts out looking like LRU (i.e. "recency" is important). But as the usage-count of certain Items begins to increase, they begin to have an effect and shift the behavior. This can be seen by step 7 where LRU lists "DAB", but example 3 shows "ABD" due to the higher usages of A and B. Then example 3 looks more like LFU for a few steps, but an interesting thing happens at step 10. Here is where LRFU starts to shine. By this point, A has been added 4 times while "B", "C", and "D" have each been added twice. LRU shows "DCA" because "D" and "C" were more recently added, but it ignores that fact that the user is twice as likely to choose "A" over "D" or "C". LFU shows "ABD" which is okay except that the User selected "D" twice after choosing "B" which suggests that usage of "D" is "heating up" and therefore more likely than "B". LRFU gets it right by showing "ADB". Of course this is all somewhat subjective, and other reader might disagree that this is a better choice. After all, we're trying to predict the User's future choices based on previous history, so there's no perfect solution. But with LRFU you can at least "tune" the USAGE_WEIGHT parameter to find the right balance of LRU vs LFU for a given situation.

In fact, for some applications it might even be preferable to dynamically change the USAGE_WEIGHT as the program progresses to improve prediction based on historical data. This probably wouldn't make sense for user-interface MRU lists, but more for predicting high-volume or high-frequency events.

FYI, In the LRFU algorithm discussed in the paper, the Score is called "CRF". The algorithm they discuss also stores the "time" (or step number) at which each Score is calculated. By storing the Score and the time, is is possible to only update the Score for the Item being added as well as a small subset of Items -- not the entire list. In addition, the sort-order is maintained by the heap and linked-list data structure combination, so the algorithm is much more efficient than what I've described here using a simple array and recalculating and re-sorting the list after each addition. But this implementation is straightforward, easy to understand, and will work fine for user-interface MRU lists.

Here is a very basic implementation of a naive LRFU list in Java. There is a lot that can be done to improve it in terms of performance, but it sufficiently demonstrates the results of LRFU:

public static void main( String[] args ) {
    double[] weights = { 1.0, 0.5, 0.9 };
    for(double weight : weights) {
        System.out.println("USAGE_WEIGHT = " + weight);
        testMRU(weight);
        System.out.println();
    }
}

private static void testMRU(double weight) {
    PrintStream p = System.out;
    MRUList<String> list = new MRUList<>(3, weight);
    String[] lettersAdded = "A B C B A A D A C D A B D E C B A".split(" ");
    for(int i = 0; i < lettersAdded.length; i++) {
        String value  = lettersAdded[i];
        list.add(value);
        p.printf("%3s. (Added %s) \"", i, value);
        for(MRUItem<String> item : list.list)
            p.print(item.Value);
        p.print("\"\t[ ");
        for(MRUItem<String> item : list.list) {
            p.printf("{ %s, %.5s } ", item.Value, item.Score);
        }
        p.println("]");
    }
}

private static class MRUList<T> {
    public static final double SCORE_INIT = 1.0;

    private double usageWeight; // factor that balances LRU vs LFU
    private int maxSize; // maximum number of MRU items.
    public ArrayList<MRUItem<T>> list;

    public MRUList(int maxItemCount) { this(maxItemCount, 0.9); }
    public MRUList(int maxItemCount, double usageWeightFactor) {
        maxSize = maxItemCount;
        usageWeight = usageWeightFactor;
        list = new ArrayList<>(maxSize);
    }

    // Add an item each time the user chooses it.
    public void add(T value) {
        // Update the score of all existing items
        for(MRUItem<T> item : list)
            item.Score *= usageWeight; // age the items (this does not affect sort order)

        // Search for the item in the list.
        MRUItem<T> existing = find(value);
        if (existing==null) {
            existing = new MRUItem<>(value, SCORE_INIT);
            if (list.size()<maxSize) {
                // we have room -- add the item.
                list.add(existing);
            } else {
                // no more room -- replace last item.
                list.set(list.size() - 1, existing);
            }
        } else {
            // increment the score of the item if it already existed in the list.
            existing.Score += SCORE_INIT;
        }

        // Sort the items for display.
        // Collections.sort uses the Comparable interface of MRUItem.
        Collections.sort(list);
    }

    // Get a copy of the list of items, in the correct display order.
    public List<T> getItems() {
        ArrayList<T> copy = new ArrayList<>();
        for(MRUItem<T> item : list)
            copy.add(item.Value);
        return copy;
    }

    // return an item if it's Value is already present in the list.
    private MRUItem<T> find(T value) {
        for(MRUItem<T> item : list)
            if (Objects.equals(item.Value, value))
                return item;
        return null;
    }
}
private static class MRUItem<T> implements Comparable<MRUItem<T>> {
    public T Value;
    public double Score;
    public MRUItem(final T value, final double score) {
        Score = score;
        Value = value;
    }

    // Sorts by Score in descending order (due to - sign)
    @Override
    public int compareTo(final MRUItem<T> other) {
        return -Double.compare(Score, other.Score);
    }
}
like image 186
drwatsoncode Avatar answered Feb 23 '23 01:02

drwatsoncode