Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Ideal Java Data Structure for Streaming data

I had a specific use case in mind but was not able to figure out the right data structure to use.

I have one thread which keeps streaming objects into a HashMap. Something similar to market data where you have a high and unknown frequency of ticks.

Another thread which constantly reads this map for updated Price objects and queries by key in no particular order. The queries may be multiple times for the same key in a given cycle. The reads and writes are very frequent but the read thread is only interested in the latest available data that is fully updated and doesn't necessarily block till write is complete.

I wanted your thoughts on an ideal data structure for such use cases. Are there better implementations than ConcurrentHashMap that is available?

Thanks

like image 569
trequartista Avatar asked Oct 06 '22 01:10

trequartista


2 Answers

ConcurrentHashMap. From Javadoc

A hash table supporting full concurrency of retrievals and adjustable expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.

Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration.

like image 187
Aravind Yarram Avatar answered Oct 10 '22 03:10

Aravind Yarram


One approach would be a copy-on-write scheme, something like this:

public class Prices {
    private volatile Map<String, Integer> prices = Collections.emptyMap();

    public void putPrice(String ticker, int price) {
        HashMap<String, Integer> newPrices = new HashMap<String, Integer>(prices);
        newPrices.put(ticker, price);
        prices = newPrices;
    }

    public Integer getPrice(String ticker) {
        return prices.get(ticker);
    }
}

This has a minimal overhead for gets - one read from a volatile, and then a normal hash lookup. However, it has a substantial overhead for puts - the creation of a whole new map, plus a write to a volatile. If your ratio of reads to writes was high, this might still be a good tradeoff.

You can improve this by only mutating the map when you actually need to add a new entry, rather than updating an existing one; you can achieve that by using mutable values:

public class Prices {
    private volatile Map<String, AtomicInteger> prices = Collections.emptyMap();

    public void putPrice(String ticker, int price) {
        AtomicInteger priceHolder = prices.get(ticker);
        if (priceHolder != null) {
            priceHolder.set(price);
        }
        else {
            HashMap<String, AtomicInteger> newPrices = new HashMap<String, AtomicInteger>(prices);
            newPrices.put(ticker, new AtomicInteger(price));
            prices = newPrices;
        }
    }

    public Integer getPrice(String ticker) {
        AtomicInteger priceHolder = prices.get(ticker);
        if (priceHolder != null) return priceHolder.get();
        else return null;
    }
}

I'm not sure what the performance characteristics of an AtomicInteger are; it's possible this is slower than it looks. Assuming AtomicInteger is not unreasonably slow, this should be pretty fast - it involves two reads from a volatile plus a normal hash lookup for each get, and a read from a volatile, a hash lookup, and a single write to a volatile for updates to existing prices. It still involves duplicating the map for addition of new prices. However, in a typical market, that doesn't happen often.

like image 38
Tom Anderson Avatar answered Oct 10 '22 03:10

Tom Anderson