Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

In CouchDB, would a multi-emit map function to emulate an ad hoc type of query blow up my Couch size?

Tags:

couchdb

I have a CouchDB database (we'll say it holds project time card related data: project code, person, person's job title, task, date, hours worked, their bill rate, etc.). I want to create summary views of the project by day... or by person, or by task, or by title, or by any single attribute.

I'm concerned that I'm heading down an unsustainable path and that my database size may end up far bigger than it needs to be.

I created a view with a map function that emits each document several times, once for each attribute. That works. But does that ever reach an end point where you should stop?

I have multiple emits:

emit([doc.project, 'day', doc.day], doc);
emit([doc.project, 'month', doc.month], doc);
emit([doc.project, 'person', doc.person], doc);
emit([doc.project, 'job title', doc.persons-job-title], doc);
emit([doc.project, 'task', doc.task], doc);

Then always query with a start/end key of [project, ] to [project, , {}]

Will my database eventually just get so huge as to make it prohibitively expensive to add any new data? Is multi-emit() the preferred method for doing what I'm trying to do? Is there a better/different way out there?

Would creating the emit's dynamically based on the document just be asking for trouble in the case of some giant document coming through and creating huge storage requirements?

Basically, is there a point where I should just stop the madness?

like image 638
user791770 Avatar asked Mar 07 '12 18:03

user791770


1 Answers

First of all: Don't emit the doc as a value... you can use &include_docs=true, if you need the data in the result sets.

Second: Assuming, that your doc holds more than one project:

Does it make sense, asking for projects on a day without Month ? If not, you can use emit([doc.project,'monthday',doc.month,doc.day],1) then you can ask for all Projects in a Month:

startkey=["project1","monthday",3]&endkey=["project1","monthday",3,{}]

day of a month:

key=["project1","monthday",3,9]

If you're using a simple reduce-function (_sum) you would have the benefit of asking, how many days a project has (+in a month):

 startkey=["project1","monthday"]&endkey=["project1","monthday",{}]&group_level=3

...

"key":["project1","monthday",2],"value:1),  // 1 Day in month 2
"key":["project1","monthday",3],"value:2)   // 2 Days in month 3

using group_level=4 (same as reduce=false) :

"key":["project1","monthday",2,20],"value:1), 
"key":["project1","monthday",2,21],"value:1),  
"key":["project1","monthday",3,1],"value:1), 

of course you can combine the last case with &include_docs=true to get the data

Third:

It is ok to emit more than one Value per Document... Of course you could seperate the emits into different views, so you do not need the second key. Try to figure out, which information belongs together and are useless without others (like day/month, person/jobtitle?)

Fourth:

it is not expensive adding data.. just building views ;-)

like image 115
okurow Avatar answered Nov 18 '22 12:11

okurow