The input is List<Item>
sorted by score, Item
looks like:
class Item {
double score;
String category;
String author;
String poi;
}
Now I need to select 10 elements which have the highest scores from the array, under these limitations:
poi
author
category
. And the length of any subsequence from the same category
should not be longer than 2.If there is no subsequence which satisfies above rules, just return the first 10 elements.
Now, I directly iterate over the List
, and use three HashMap<String, Integer>
to store the appearences of each cagetory/poi/author
. And I use List<Item> selected
to store the result.
poi
, then the new element will be discarded.author
, then the new element will be discarded.category
, then the new element will be discarded.selected
that have this category
, then the new element will be discarded.It works when the input is large, but when the input is relatively small, it does not work. For example, when the input is
Then my solution will be
Item3
discarded, because it has the same category
as Item1
and Item2
Item4
discarded, because it has the same author
as Item2
And this does not satisfy the select top 10 elements
. The correct solution is discarding Item2
and only 10 elements should remain.
I think my solution is just in the wrong direction. So I'm looking for other solutions to deal with this problem. Any solution produces the desired output is appreciated.
This issue occurs when your list or library exceeds the list view threshold limit of 5,000 items for a view. For more information, go to SharePoint Online limits.
On the Home tab, in the Sort & Filter group, click Advanced and then click Advanced Filter/Sort on the shortcut menu. Add the fields on which you want to filter to the grid. In the Criteria row of each field, specify a criterion.
Select the data that you want to filter On the Data tab, in the Sort & Filter group, click Filter. in the column header to display a list in which you can make filter choices. Note Depending on the type of data in the column, Microsoft Excel displays either Number Filters or Text Filters in the list.
The original algorithm you used will always tend to minimize the number of results, because in any mutual-exclusive choice between items the highest-score item wins. This way the algorithm operates like a sieve, eliminating many lower-score items.
In order to support choosing a set of at least size X (10 in this case) from an original set of items length Y (11 in your example), you will need to collect a list of decision sets rather than eliminating items by score alone. A decision set(m,n) is a set of m items from which you must choose to keep n items and eliminate the rest. Since most of the rules in your system are single item of attribute x, most decision sets in your list will be set(m,1) - choose 1 of the m items, eliminate the rest.
The first pass on the full items set will populate the decision set list, and the second pass will go over that list and choose from each decision set the items to eliminate from the original set. Once a decision is made and the item/s are eliminated from the original set, the decision set is removed from the list (resolved). Once the list of decision sets has been cleared, your original set is legal.
The goal is to have the decision set list cleared in at most Y-X eliminations. Since an item can appear in multiple decision sets, you can also add for each item a "survival score". The survival score suggests the maximum number of item which will have to be eliminated if this item is kept. It is calculated per item by going over each decision set (m,n) and adding to each item contained m-n to its accumulated score.
Let's look at your example, and build its decision sets:
The decision sets we compile are (note the survival score in parenthesis):
Our goal is to resolve the decision set list in at most 1 elimination. You can see that all the items are with survival score 1 (meaning that keeping them will result in at most 1 other item eliminated) except for item 2 which has survival score of 2 (keeping it will eliminate at most 2 items). We cannot afford 2 items, and therefore we cannot afford to keep item 2 regardless of its score. eliminating it will resolve both decision sets and is the only option.
The more general algorithm can be more complex: in every iteration you eliminate the items with survival score you cannot afford, and if you're not near that limit use a combination of score and survival-score to decide which one must go.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With