Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

View Collation with Couchbase

We are using couchbase as our nosql store and loving it for its capabilities. There is however an issue that we are running in with creating associations via view collation. This can be thought of akin to a join operation. While our data sets are confidential I am illustrating the problem with this model.

The volume of data is considerable so cannot be processed in memory.Lets say we have data on ice-creams, zip-code and average temperature of the day. One type of document contains a zipcode to icecream mapping and the other one has transaction data of an ice-cream being sold in a particular zip. The problem is to be able to determine a set of top ice-creams sold by the temperature of a given day.

We crunch this corpus with a view to emit two outputs, one is a zipcode to temperature mapping , while the other represents an ice-cream sale in a zip code. :

Key Value
[zip1] temp1
[zip1,ice_cream1] 1
[zip2,ice_cream2]   1

The view collation here is a mechanism to create an association between the ice_cream sale, the zip and the average temperature ie a join.

We have a constraint that the temperature lookup happens only once in 24 hours when the zip is first seen and that is the valid avg temperature to use for that day. eg lookup happened at 12:00 pm on Jan 1st, next lookup does not happen till 12:00 pm Jan 2nd. However the avg temperature that is accepted in the 1st lookup is valid only for Jan 1st and that on the 2nd lookup only for Jan 2 including the first half of the day. Now things get complicated when I want to do the same query with a time component involved, concretely associating the average temperature of a day with the ice-creams that were sold on that day in that zip.eg. x vanilla icecreams were sold when the average temperature for that day is 70 F

Key Value
[y,m,d,zip1] temp1
[y,m,d,zip2,ice_cream2 ] 1
[y,m,d2,zip1,ice_cream1] 1

This has an interesting impact on the queries, say I query for the last 1 day I cannot make any associations between the ice-cream and temperature before the first lookup happens, since that is when the two keys align. The net effect being that I lose the ice-cream counts for that day before that temperature lookup happens. I was wondering if any of you have faced similar issues and if you are aware of a pattern or solution so as not to lose those counts.

like image 454
hvd Avatar asked Nov 02 '22 06:11

hvd


1 Answers

First, welcome to StackOverflow, and thank you for the great question.

I understand the specific issue that you are having, but what I don't understand is the scale of your data - so please forgive me if I appear to be leading down the wrong path with what I am about to suggest. We can work back and forth on this answer depending on how it might suit your specific needs.

First, you have discovered that CB does not support joins in its queries. I am going to suggest that this is not really an issue if when CB is used properly. The conceptual model for how Couchbase should be used to filter out data is as follows:

  1. Create CB view to be as precise as possible
  2. Select records as precisely as possible from CB using the view
  3. Fine-filter records as necessary in data-access layer (also perform any joins) before sending on to rest of application.

From your description, it sounds to me as though you are trying to be too clever with your CB view query. I would suggest one of two courses of action:

  1. Manually look-up the value that you want when this happens with a second view query.
  2. Look up more records than you need, then fine-filter afterward (step 3 above).
like image 83
theMayer Avatar answered Nov 15 '22 23:11

theMayer