Lets say I have the following model in JAVA
class Shape {
String type;
String color;
String size;
}
And say I have the following data based on the model above.
Triangle, Blue, Small
Triangle, Red, Large
Circle, Blue, Small
Circle, Blue, Medium
Square, Green, Medium
Star, Blue, Large
I would like to answer the following questions
Given the type Circle how many unique colors?
Answer: 1
Given the type Circle how many unique sizes?
Answer: 2
Given the color Blue how many unique shapes?
Answer: 2
Given the color Blue how many unique sizes?
Answer: 3
Given the size Small how many unique shapes?
Answer: 2
Given the size Small how many unique colors?
Answer: 1
I'm wondering if I should model it the following way...
set: shapes -> key: type -> bin(s): list of colors, list of sizes
set: colors -> key: color -> bin(s): list of shapes, list of sizes
set: sizes -> key: size -> bin(s): list of shapes, list of colors
Or is there a better way to do this? If I do this way I need 3 times more the storage.
I also expect to have billions of entries for each set. Btw the model has been redacted to protect the inoncent code ;)
Data modeling in NoSQL is always about how you plan to retrieve the data, at what throughput and at what latency.
There are several ways to model this data; the simplest is to mimic the class structure where each field becomes a Bin. You could define Secondary Indexes on each bin and use Aggregation Queries to answer your questions (above).
But this is only one way; you may need to satisfy the factors of latency and throughput with a different data model.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With