Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

What's the best way to store/query multiple types within a RavenDB collection?

I am designing a logging system that will store its log entries in RavenDB, and for this particular system I want to store (and later query) documents that have varying data structures based on the type of event being logged. Consider the following events that I may want to log:

  1. User logs in - Store the UserID
  2. User deletes a file - Store the UserID and the filename that's being deleted

I have a few different ways I can go here...

Option A. Create two completely different types

class LoginEvent
{
  public int UserId { get; set; }
}

class FileDeleteEvent
{
  public int UserId { get; set; }
  public string Filename { get; set; }
}

This approach results in two distinct collections within RavenDB and they are easily queryable. However, retrieving a union of all the log entries requires multiple queries and multiple round-trips to the server--one for the LoginEvents and a second for the FileDeleteEvents. With only two event types it doesn't make much difference, but the problem gets considerably worse as the number of event types increases.

Option B. Create a base class and derive from that

abstract class Event
{
}

class LoginEvent : Event
{
  public int UserId { get; set; }
}

class FileDeleteEvent : Event
{
  public int UserId { get; set; }
  public string Filename { get; set; }
}

I tried this approach, but RavenDB appears to store and query documents by their actual types, not their casted types--when I did Query<Event>().ToArray() I got zero results. In order to get the documents back I would have to query on their individual types, which effectively makes this equivalent to Option A above.

Option C. Create varying property classes

enum EventType { Login, FileDelete }

class Event
{
  public EventType EventType { get; set; }
  public object Info { get; set; }
}

class LoginInfo
{
  public int UserId { get; set; }
}

class FileDeleteInfo
{
  public int UserId { get; set; }
  public string Filename { get; set; }
}

With this approach we always store an entry of type Event, but we populate its Info property with the corresponding Info class that provides details specific to the event type. At first this option seemed the best, as it stores all log entries in a single Event collection and makes querying the full collection easy. However, let's say I only want the FileDelete events where the Filename is "test.txt". This becomes a little tricky.

For example, the following throws a somewhat obscure error about the "Filename" field not being indexed:

var events = session.Query<Event>()
  .Where(a => a.EventType == EventType.FileDelete)
  .Where(a => ((FileDeleteInfo)a.Info).Filename == "test.txt")
  .ToArray();

The following, aside from not being what I want, returns zero results:

var events = session.Query<Event>()
  .Select(a => a.Info)
  .OfType<FileDeleteInfo>()
  .Where(a => a.Filename == "test.txt")
  .ToArray();

Indeed, the following projection, a supported operation according to the documentation, doesn't even return the expected type, just a bunch of strange intermediate results that make no sense:

var events = session.Query<Event>()
  .Select(a => a.Info)
  .ToArray();

So, although this option is probably good from a data storage perspective, it fails from a queryability perspective. (Assuming I am building the proper query -- there might be another way I'm not considering).

Option D. Create a giant event class with all possible properties

enum EventType { Login, FileDelete }

class Event
{
  public EventType EventType { get; set; }
  public int UserId { get; set; }
  public string Filename { get; set; }
  .
  .
  .
}

This approach, although a bit wasteful from a storage perspective, is trivial from a queryability perspective. The problem occurs when you start adding more types of events you want to log--then the number of properties begins to mushroom.

Option E. Forget RavenDB and use Entity Framework + Sql

I can do this fairly trivially and query efficiently using EF's table-per inheritance pattern. The down side to this approach is that Sql is serious overkill for this problem--we don't need the data consistency and other rigor that relational systems provide. And, in my experience, Sql inserts are much, much slower than document storage into RavenDB (an important consideration for a logging system).

So, there are the options ... what do you think? Is there anything I missed?

Possibly related: Specifying Collection Name in RavenDB

like image 469
Mike Avatar asked Feb 25 '12 06:02

Mike


2 Answers

The "official" way to solve this problem appears to be a Polymorphic Index: https://ravendb.net/docs/article-page/3.0/csharp/indexes/indexing-polymorphic-data

Here is a blog entry discussing this approach in detail: http://www.philliphaydon.com/2011/12/14/ravendb-inheritance-revisited/

There is also a video here: http://youtu.be/uk2TVs-d6sg

like image 194
Mike Avatar answered Nov 15 '22 16:11

Mike


Go with the base class thing. The trick is to use polymorphism and set all concrete types to use the same type tag name. Now, you can query them easily, since they are in the same collection.

FindTypeTagName = type =>
{
    if (typeof (LoginEvent).IsAssignableFrom(type) ||
        typeof (FileDeleteEvent).IsAssignableFrom(type))
        return "event";
    return DocumentConvention.DefaultTypeTagName(type);
}
like image 20
Daniel Lang Avatar answered Nov 15 '22 17:11

Daniel Lang