dynamic data model

Tags:

I have a project that requires user-defined attributes for a particular object at runtime (Lets say a person object in this example). The project will have many different users (1000 +), each defining their own unique attributes for their own sets of 'Person' objects.

(Eg - user #1 will have a set of defined attributes, which will apply to all person objects 'owned' by this user. Mutliply this by 1000 users, and that's the bottom line minimum number of users the app will work with.) These attributes will be used to query the people object and return results.

I think these are the possible approaches I can use. I will be using C# (and any version of .NET 3.5 or 4), and have a free reign re: what to use for a datastore. (I have mysql and mssql available, although have the freedom to use any software, as long as it will fit the bill)

Have I missed anything, or made any incorrect assumptions in my assessment?

Out of these choices - what solution would you go for?

Hybrid EAV object model. (Define the database using normal relational model, and have a 'property bag' table for the Person table).

Downsides: many joins per / query. Poor performance. Can hit a limit of the number of joins / tables used in a query.

I've knocked up a quick sample, that has a Subsonic 2.x 'esqe interface:

Click to copy
```
Select().From().Where  ... etc
```
Which generates the correct joins, then filters + pivots the returned data in c#, to return a datatable configured with the correctly typed data-set.

I have yet to load test this solution. It's based on the EA advice in this Microsoft whitepaper: SQL Server 2008 RTM Documents Best Practices for Semantic Data Modeling for Performance and Scalability
Allow the user to dynamically create / alter the object's table at run-time. This solution is what I believe NHibernate does in the background when using dynamic properties, as discussed where

http://bartreyserhove.blogspot.com/2008/02/dynamic-domain-mode-using-nhibernate.html

Downsides:

As the system grows, the number of columns defined will get very large, and may hit the max number of columns. If there are 1000 users, each with 10 distinct attributes for their 'Person' objects, then we'd need a table holding 10k columns. Not scalable in this scenario.

I guess I could allow a person attribute table per user, but if there are 1000 users to start, that's 1000 tables plus the other 10 odd in the app.

I'm unsure if this would be scalable - but it doesn't seem so. Someone please correct me if I an incorrect!
Use a NoSQL datastore, such as CouchDb / MongoDb

From what I have read, these aren't yet proven in large scale apps, based on strings, and are very early in development phase. IF I am incorrect in this assessment, can someone let me know?

http://www.eflorenzano.com/blog/post/why-couchdb-sucks/
Using XML column in the people table to store attributes

Drawbacks - no indexing on querying, so every column would need to be retrieved and queried to return a resultset, resulting in poor query performance.
Serializing an object graph to the database.

Drawbacks - no indexing on querying, so every column would need to be retrieved and queried to return a resultset, resulting in poor query performance.
C# bindings for berkelyDB

From what I read here: http://www.dinosaurtech.com/2009/berkeley-db-c-bindings/

Berkeley Db has definitely proven to be useful, but as Robert pointed out – there is no easy interface. Your entire wOO wrapper has to be hand coded, and all of your indices are hand maintained. It is much more difficult than SQL / linq-to-sql, but that’s the price you pay for ridiculous speed.

Seems a large overhead - however if anyone can provide a link to a tutorial on how to maintain the indices in C# - it could be a goer.
SQL / RDF hybrid. Odd I didn't think of this before. Similar to option 1, but instead of an "property bag" table, just XREF to a RDF store? Querying would them involve 2 steps - query the RDF store for people hitting the correct attributes, to return the person object(s), and use the ID's for these person object in the SQL query to return the relational data. Extra overhead, but could be a goer.

423

asked Jan 10 '10 13:01

James

2 Answers

The ESENT database engine on Windows is used heavily for this kind of semi-structured data. One example is Microsoft Exchange which, like your application, has thousands of users where each user can define their own set of properties (MAPI named properties). Exchange uses a slightly modified version of ESENT.

ESENT has a lot of features that enable applications with large meta-data requirements: each ESENT table can have about ~32K columns defined; tables, indexes and columns can be added at runtime; sparse columns don't take up any record space when not set; and template tables can reduce the space used by the meta-data itself. It is common for large applications to have thousands of tables/indexes.

In this case you can have one table per user and create the per-user columns in the table, creating indexes on any columns that you want to query. That would be similar to the way that some versions of Exchange store their data. The downside of this approach is that ESENT doesn't have a query engine so you will have to hand-craft your queries as MakeKey/Seek/MoveNext calls.

A managed wrapper for ESENT is here:

http://managedesent.codeplex.com/

108

answered Sep 19 '22 17:09

Laurion Burchall

In a EAV model you don't have to have many joins, as you can just have the joins you need for the query filtering. For the resultset, return property entries as a separate rowset. That is what we are doing in our EAV implementation.

For example, a query might return persons with extended property 'Age' > 18:

Properties table:

Click to copy

1        Age
2        NickName

First resultset:

Click to copy

PersonID Name
1        John
2        Mary

second resultset:

Click to copy

PersonID PropertyID Value
1        1         24
1        2         'Neo'
2        1         32
2        2         'Pocahontas'

For the first resultset, you need an inner join for the 'age' extended property to query the basic Person object entity part:

Click to copy

select p.ID, p.Name from Persons p
join PersonExtendedProperties pp
on p.ID = pp.PersonID
where pp.PropertyName = 'Age'
and pp.PropertyValue > 18 -- probably need to convert to integer here

For the second resultset, we are making an outer join of the first resultset with PersonExtendedProperties table to get the rest of the extended properties. It's a 'narrow' resultset, we do not pivot the properties in sql, so we don't need multiple joins here.

Actually we use separate tables for different types to avoid data type conversion, to have extended properties indexed and easily queriable.

answered Sep 16 '22 17:09

George Polevoy

Related questions
                            
                                REST best-practice for overlong URIs
                            
                                Getting started in creating simple bootable C program
                            
                                Is it possible to do asynchronous / parallel database query in a Django application?
                            
                                Simple CSS tabs - need to break border on active tab
                            
                                Is it possible to change parallelOptions.MaxDegreeOfParallelism during execution of a Parallel.ForEach
                            
                                Searching for text in an Object watch window in visual studio
                            
                                Package a django project and its dependencies for a standalone "product"
                            
                                How use AppEngine's Datastore Admin: Copy to Another App Feature
                            
                                Handle screen orientation changes when there are AsyncTasks running
                            
                                ASP.NET MVC Master Detail Entry Form
                            
                                Android emulator is very very slow
                            
                                Improve SOAP UI performance

Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!

Donate Us With

dynamic data model

Tags:

James

People also ask

2 Answers

Laurion Burchall

George Polevoy

Recent Activity

Donate For Us