Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

ORM for large volume database

I am working on new project which have data oriented means very large volume of data (increasing day by day). So kindly suggest me which type of approach I should use to achieve desire functionality with out any hurdles.

  • Is database fully normalized?
  • Which ORM (linq2sql, entity framework) is suitable for this project?
  • Should I use stored procedures, db functions, triggers, etc?
like image 325
Xulfee Avatar asked May 30 '11 12:05

Xulfee


2 Answers

Whether or not the database is normalized is something you need to know and need to answer!

As for the ORM: it really depends on the type of data and its structure.

Linq-to-SQL is a very simplistic ORM that basically just does a 1:1 mapping of tables to domain objects. As long as you don't need anything else - that's fine. Linq-to-SQL is no longer being actively developed, so that might be a drawback. Also, stored proc support is a bit limited.

Entity Framework (at least in .NET 4) is great and is the current ORM of choice at Microsoft - it's being actively developed, has a lot of backing, lot of flexibility. It offers database-first, model-first and code-first development styles, it supports POCO objects and self-tracking entities, and is very well integrated with stored procs (you can define a stored proc for INSERT, UPDATE, DELETE on every single entity, if you wish to do so). It would be my first choice.

NHibernate is a great, enterprise-level ORM, well established and being actively developed - certainly not a "dead-end" like Linq-to-SQL. I've used it a few years ago, and while it's great and powerful, it's also a bit harder to learn than EF4 (no visual designer, needs more manual labor, manual effort). It's great if you really need all its power and if you're willing to invest the necessary up-front learning time.

As for the database: stored procs are definitely worth while investigating, especially if you need to encapsulate certain database processing into a nice proc to call from your code. I would be rather careful and defensive about using triggers and functions too much - they have their place, but they shouldn't be overused, since they do carry some problems with them (mostly performance problems and problems of "discoverability" - lots of devs don't think about triggers that could be in place, and will not understand what's going on).

like image 51
marc_s Avatar answered Sep 26 '22 01:09

marc_s


@Xulfee, that's a fairly broad question and a lot depends on the nature of your project. The approaches you reference affect a lot of aspects of the overall architecture. For example:

Is the database fully normalized?
Database normalization generally aids in tackling the problem of complexity of your conceptual model. When properly (note I did not say, "fully") normalized, your model should be fairly straight-forward and consumers of the database (developers, your BI team, domain experts, etc) should be able to get a good idea of the business problems that are being approached with your database. That having been said, normalization can lead to a fairly large reporting and analysis problem. When writing a query for a report against a large, fairly normalized database, you may introduce performance problems by joining a lot of tables. Enter snowflake schemas. So, to your question: it depends. What are you reporting requirements? How many transactions on average do you need to support? How complex is your conceptual model? Are you able to break the database into smaller models that are associated, rather than one large one?

Which ORM (linq2sql, entity framework) is suitable for this project?
Again, an ORM is a tool. You must ask yourself what is the specific job that you are trying to accomplish? The choice of an ORM (or in even using an ORM in the first place) is a decision that I would recommend you make fairly early on as it can affect everything from performance to development team cohesion. There are a lot of great choices out there:

  • Linq-To-Sql
  • NHibernate
  • Entity Framework
  • LLBLGen

Each of the above frameworks does a fantastic job of abstracting your persistence layer. Each has it's pro's and cons - the majority of which come down to infrastructure concerns: performance, configuration, schema/language compatibility, persistence patterns, vendor support. Given the choice, I would ask myself which of the frameworks is my development team most comfortable with? Which one supports the level of system activity that I expect? With which vendor am I willing to "throw-in"? I have seen fairly successful systems that use fairly small ORM's (i.e. Stackoverflow uses a modified version of Linq-To-Sql) as well as fairly large systems fail with fairly complex ORM's.

Should I use stored procedures, db functions, triggers, etc?
This question centers around your persistence store and how you use it (as well as how angry you want to make your DBA :) ). The use of sprocs (stored procedures) lends itself to allowing your dba to provide security at a very granular level. In addition, if the orm you are using generates dynamic sql, you might benefit from the database's ability to cache queries generated using sprocs. DB functions can be a double-sided blade. They offer the ability to add functionality and intelligence to your model, while at the same time allowing you to take a fairly large hit performance-wise (i.e. table-valued UDF's). Triggers have their own pitfalls and should be used with caution, but that discussion could get rather involved. The bottom-line for me in this case is: how much logic in the database do you want to support, and how important is security and performance? Do you have a qualified dba (not just a developer who knows how to write queries, but a dba who is capable of performance tuning and data modeling)? How big is your database? How complex is your data? Think about all of these questions and more when determining how you want to manage you data.

In summary, you are asking some good questions. Don't confuse infrastructure needs with implementation needs. Decide on a stack and run with it, don't get bogged-down in implementation details to the point at which you are unable to successfully complete the project. With the right level of abstraction, you may find it easier to try out new and different technologies without risking the overall success of the project. And remember: there's nothing wrong with experimenting and trying new things, just be prepared to fail gracefully and test, test, test!

like image 26
Bobby D Avatar answered Sep 24 '22 01:09

Bobby D