I have more than a 1 billion items with approximatelly 1000 columns (a matrix). But for 95% columns unique values ratio is less than a percent, so this data could be classified as sparse data.
What is an effient and prod-ready solution for storing such a data in Java?
Not sure if you've thought this through. If you really have billions of rows, even if you find a mechanism to store your sparse matrix efficiently you may well have problems holding that much data in memory anyway.
You could, however, use a simple map whose key is a Pair
which holds the row and column for the datum.
public class Pair<P, Q> {
public final P p;
public final Q q;
public Pair(P p, Q q) {
this.p = p;
this.q = q;
}
// TODO: Implement equals and hashCode.
}
class Datum {
}
// My sparse database.
Map<Pair<Integer, Integer>, Datum> data = new HashMap<>();
This would use close to minimal storage but does not necessarily solve your problem.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With