Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Partition a Set into smaller Subsets and process as batch

Tags:

I have a continuous running thread in my application, which consists of a HashSet to store all the symbols inside the application. As per the design at the time it was written, inside the thread's while true condition it will iterate the HashSet continuously, and update the database for all the symbols contained inside HashSet.

The maximum number of symbols that might be present inside the HashSet will be around 6000. I don't want to update the DB with all the 6000 symbols at once, but divide this HashSet into different subsets of 500 each (12 sets) and execute each subset individually and have a thread sleep after each subset for 15 minutes, so that I can reduce the pressure on the database.

This is my code (sample code snippet)

How can I partition a set into smaller subsets and process (I have seen the examples for partitioning ArrayList, TreeSet, but didn't find any example related to HashSet)

package com.ubsc.rewji.threads;

import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.concurrent.PriorityBlockingQueue;

public class TaskerThread extends Thread {
    private PriorityBlockingQueue<String> priorityBlocking = new PriorityBlockingQueue<String>();
    String symbols[] = new String[] { "One", "Two", "Three", "Four" };
    Set<String> allSymbolsSet = Collections
            .synchronizedSet(new HashSet<String>(Arrays.asList(symbols)));

    public void addsymbols(String commaDelimSymbolsList) {
        if (commaDelimSymbolsList != null) {
            String[] symAr = commaDelimSymbolsList.split(",");
            for (int i = 0; i < symAr.length; i++) {
                priorityBlocking.add(symAr[i]);
            }
        }
    }

    public void run() {
        while (true) {
            try {
                while (priorityBlocking.peek() != null) {
                    String symbol = priorityBlocking.poll();
                    allSymbolsSet.add(symbol);
                }
                Iterator<String> ite = allSymbolsSet.iterator();
                System.out.println("=======================");
                while (ite.hasNext()) {
                    String symbol = ite.next();
                    if (symbol != null && symbol.trim().length() > 0) {
                        try {
                            updateDB(symbol);

                        } catch (Exception e) {
                            e.printStackTrace();
                        }
                    }
                }
                Thread.sleep(2000);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }

    public void updateDB(String symbol) {
        System.out.println("THE SYMBOL BEING UPDATED IS" + "  " + symbol);
    }

    public static void main(String args[]) {
        TaskerThread taskThread = new TaskerThread();
        taskThread.start();

        String commaDelimSymbolsList = "ONVO,HJI,HYU,SD,F,SDF,ASA,TRET,TRE,JHG,RWE,XCX,WQE,KLJK,XCZ";
        taskThread.addsymbols(commaDelimSymbolsList);

    }

}
like image 662
Pawan Avatar asked Oct 17 '13 09:10

Pawan


3 Answers

With Guava:

for (List<String> partition : Iterables.partition(yourSet, 500)) {
    // ... handle partition ...
}

Or Apache Commons:

for (List<String> partition : ListUtils.partition(yourList, 500)) {
    // ... handle partition ...
}
like image 73
Andrey Chaschev Avatar answered Oct 12 '22 00:10

Andrey Chaschev


Do something like

private static final int PARTITIONS_COUNT = 12;

List<Set<Type>> theSets = new ArrayList<Set<Type>>(PARTITIONS_COUNT);
for (int i = 0; i < PARTITIONS_COUNT; i++) {
    theSets.add(new HashSet<Type>());
}

int index = 0;
for (Type object : originalSet) {
    theSets.get(index++ % PARTITIONS_COUNT).add(object);
}

Now you have partitioned the originalSet into 12 other HashSets.

like image 39
Amir Pashazadeh Avatar answered Oct 11 '22 23:10

Amir Pashazadeh


We can use the following approach to divide a Set.

We will get the output as [a, b] [c, d] [e]`

private static List<Set<String>> partitionSet(Set<String> set, int     partitionSize)
{
    List<Set<String>> list = new ArrayList<>();
    int setSize = set.size();

    Iterator iterator = set.iterator();

    while(iterator.hasNext())
    {
        Set newSet = new HashSet();
        for(int j = 0; j < partitionSize && iterator.hasNext(); j++)
        {
            String s = (String)iterator.next();
            newSet.add(s);
        }
        list.add(newSet);
    }
    return list;
}

public static void main(String[] args)
{
    Set<String> set = new HashSet<>();
    set.add("a");
    set.add("b");
    set.add("c");
    set.add("d");
    set.add("e");

    int size = 2;
    List<Set<String>> list = partitionSet(set, 2);

    for(int i = 0; i < list.size(); i++)
    {
        Set<String> s = list.get(i);
        System.out.println(s);
    }
}
like image 42
PipoTells Avatar answered Oct 12 '22 00:10

PipoTells