MongoDB Approaches for storing large amounts of metrics / analytics data

Tags:

We are planning on using MongoDB to store large amounts of analytics data such as views and clicks. I'm unsure on the best way to structure the documents within MongoDB to aid querying and reduce database size.

We need to record actions agains a pagename, client and the type of action. Ideally we need stats which go down the the year/month/day/hour level, we don't need or care about views per second or minute. While this document structure looks ok, I'm aware 100 vistors would generate a 100 new documents.

{ 
  "_id" : ObjectId( "4dabdef81a34961506040000" ),
  "pagename" : "Hello",
  "action" : "view",
  "client" : "client-name",
  "time" : Date( "Mon Apr 18 07:49:28 2011" )
}

Is there best practice way of doing this, either using $inc or Capped Collections?

670

asked Apr 19 '11 06:04

Tom

2 Answers

Updated answer

Hacked together in the mongo shell:

use pagestats;

// a little helper function
var pagePerHour = function(pagename) {
    d = new Date();
    return {
        page : pagename,
        year: d.getUTCFullYear(),
        month: d.getUTCMonth(),
        day : d.getUTCDate(),
        hour: d.getUTCHours(),
    }
}

// a pageview happened
db.pagestats.update(
    pagePerHour('Hello'),
    { $inc : { views : 1 }},
    true ); //we want to upsert

// somebody tweeted our page twice!
db.pagestats.update(
    pagePerHour('Hello'),
    { $inc : { tweets : 2 }},
    true ); //we want to upsert

db.pagestats.find();
// { "_id" : ObjectId("4dafe88a02662f38b4a20193"),
//   "year" : 2011, "day" : 21, "hour" : 8, "month" : 3,
//   "page" : "Hello",
//   "tweets" : 2, "views" : 1 }

// 24 hour summary 'Hello' on 2011-4-21
for(i = 0; i < 24; i++) {
    //careful: days (1-31), month (0-11) and hours (0-23)
    stats = db.pagestats.findOne({ page: 'Hello', year: 2011, month: 3, day : 21, hour : i})
    if(stats) {
        print(i + ': ' + stats.views + ' views')
    } else {
        print(i + ': no hits')
    };
}

Depending on which aspects you want to track you might consider adding more collections (e.g. a collection for user centric tracking). Hope that helps.

See also

Blogpost about Analytics Data

answered Sep 22 '22 15:09

Matt

I wouldn't worry too much about space, Mongo can scale pretty much infinitely in that regard, adding more space would be reasonably cheap.

One thing to be aware of is the fact that if you keep updating a document its size will grow, which means Mongo will eventually need to find a new place for it in the index. If you have a lot of documents being updated and increasing in size Mongo will need to copy these documents around a lot, this can slow stuff down significantly. Of course this all depends on how much traffic you're expecting.

Based on my experience, go with a simple document format where you don't need to update the documents, it might complicate your querying later on, but you can use map/reduce to get whatever information you want regardless of your document structure (map reduce is very flexible given enough experience you can do anything).

answered Sep 19 '22 15:09

skorks

Related questions
                            
                                Best Way To Store Multiple Flags In Database
                            
                                Fast Relational method of storing tree data (for instance threaded comments on articles)
                            
                                Mysql Datatype for US Zip (Postal Codes)
                            
                                "date" as a column name
                            
                                Historical / auditable database
                            
                                How to structure the tables of a very simple blog in MySQL? [closed]
                            
                                SQLite composite key (2 foreign keys) Link table
                            
                                Relational database design question - Surrogate-key or Natural-key?
                            
                                How to understand the 5th Normal Form?
                            
                                Users asking for denormalized database
                            
                                Publish / Subscribe pattern in SQL
                            
                                What is the best practice to store a "saved search" in a database
                            
                                How can graph databases scale horizontally, if at all?
                            
                                Why is the foreign key part of the primary key in an identifying relationship?
                            
                                Structure within staging area of data warehouse
                            
                                Alternatives to traditional relational databases for activity streams
                            
                                Delivering activity feed items in a moderately scalable way
                            
                                How can I calculate database design storage costs?
                            
                                PostgreSQL: How to structure and index time-related data for optimal query performance?
                            
                                How To Develop A Database Schema For A Tree Structure(Directed acyclic graph)

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

MongoDB Approaches for storing large amounts of metrics / analytics data

Tags:

mongodb

database-design

statistics

analytics

Tom

People also ask

2 Answers

Matt

skorks

Recent Activity

Donate For Us