Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to consistently retain an exact percentage of elements from a List in Java?

I'm currently filtering a Java List to keep approximately a percentage of its elements using a random approach:

import java.util.List;
import java.util.Random;

public class Main {
    public static void main(String[] args) {
        List<String> list = List.of("1", "2", "3", "4", "5", "6", "7", "8", "9", "10");

        // Percentage of elements to retain (between 0 and 100)
        int percentageToRetain = 30; // Change this value to adjust the percentage

        Random random = new Random();
        List<String> result = list.stream()
                .filter(s -> random.nextInt(100) < percentageToRetain)
                .toList();

        System.out.println(result);
    }
}

While this works for approximating the percentage, each run produces different results, and the actual percentage retained can vary significantly from my target, especially with smaller lists.

For example: When targeting 30% of a 10-element list, I might get 2 elements (20%) on one run and 4 elements (40%) on another.

My question: How can I modify my code to consistently retain exactly the specified percentage of elements from a List? Ideally, I'd like a solution that:

  • Keeps exactly percentageToRetain percent of the elements (rounding on the lower as needed)

  • Still provides randomness in which specific elements are selected

  • Works efficiently for both small and large lists

How to handle this kind of "exact percentage sampling" problem?

like image 566
user30654079 Avatar asked Nov 16 '25 18:11

user30654079


1 Answers

Your Approach

@Marce Puente correctly points outin the question comments that retaining an exact percentage of elements is impossible, at least for smaller list sizes.

the idea of “exact percentage” is already wrong, imagine such a list: [ 10, 5, 1 ] and you want to keep it to *50%, what would you put, an item and a half, no, you have to set a tolerance, which will be determined by the number of item's.

For larger input sizes the rounding error decreases.

Another problem, as the busybee pointed out, is that discarding elements with a chance is not guranteed to return a stable amount of elements.

To illustrate that, here are 10 sets of random numbers between 0 and 100 (I noted the number of elements that would be retained given percentageToRetain = 30 in brackets):

0 64 93 11 77 13 82 87 8 68 (4 retains)
77 60 39 97 64 46 33 42 87 79 (0 retains)
26 91 0 32 49 31 15 7 2 7 (5 retains)
95 90 38 10 70 37 32 17 82 98 (2 retains)
11 43 96 58 10 40 30 4 40 54 (3 retains)
2 66 86 87 17 88 92 64 10 79 (4 retains)
3 86 75 51 30 37 2 96 49 89 (2 retains)
9 10 23 84 52 28 32 14 12 33 (5 retains)
53 91 54 66 28 61 67 35 45 70 (1 retains)
3 77 79 24 37 60 6 80 8 39 (4 retains)

My suggestion

  1. calculate the.number of elements you wish to retain: int numberOfElementsToRetain = (int) (source.size() * percentageToRetain / 100.0);
  2. make a copy of the input list to a known mutable type (you can skip this step if the input is guranteed to be mutable & you don't mind ruining it)
  3. shuffle it
  4. copy the first numberOfElementsToRetain elements to a new list & return that.

Thus we get:

public static <E> List<E> retainNumberOfElementsRandomly2(List<E> source, int percentageToRetain){
    int numberOfElementsToRetain = (int) (source.size() * percentageToRetain / 100.0);
    ArrayList<E> defcopy = new ArrayList<>(source);
    
    Random r = new Random();
    
    Collections.shuffle(defcopy, r);

    List<E> result = new ArrayList<>(numberOfElementsToRetain);
    
    for(int i = 0; i < numberOfElementsToRetain; i++){
        result.add(defcopy.get(i));
    }
    
    return result;
}
like image 168
Jannik S. Avatar answered Nov 18 '25 09:11

Jannik S.



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!