Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Java: how to store sparse data efficiently

I have more than a 1 billion items with approximatelly 1000 columns (a matrix). But for 95% columns unique values ratio is less than a percent, so this data could be classified as sparse data.

What is an effient and prod-ready solution for storing such a data in Java?

like image 431
Denis Kulagin Avatar asked Nov 10 '22 22:11

Denis Kulagin


1 Answers

Not sure if you've thought this through. If you really have billions of rows, even if you find a mechanism to store your sparse matrix efficiently you may well have problems holding that much data in memory anyway.

You could, however, use a simple map whose key is a Pair which holds the row and column for the datum.

public class Pair<P, Q> {

    public final P p;
    public final Q q;

    public Pair(P p, Q q) {
        this.p = p;
        this.q = q;
    }

    // TODO: Implement equals and hashCode.
}

class Datum {
}
// My sparse database.
Map<Pair<Integer, Integer>, Datum> data = new HashMap<>();

This would use close to minimal storage but does not necessarily solve your problem.

like image 197
OldCurmudgeon Avatar answered Nov 15 '22 09:11

OldCurmudgeon