I am working on a project involving large amount of data from the delicious website. The data available is "Date, UserId, Url, Tags" (for each bookmark).
I normalized my database to a 3NF, and because of the nature of the queries that we wanted to use in combination, I came down to 6 tables... The design looks fine, however, now that a large amount of data is in the database, most of the queries need to join at least 2 tables together to get the answer, sometimes 3 or 4. At first, we didn't have any performance issues, because for testing matters we had not added too much data to the database. Now that we have a lot of data, simply joining extremely large tables takes a lot of time and for our project, which has to be real-time, this is a disaster.
I was wondering how big companies solve these issues. Looks like normalizing tables just adds complexity, but how does the big company handle large amounts of data in their databases, don't they use normalization?
Thanks.
MongoDB is also considered to be the best database for large amounts of text and the best database for large data.
I think, the best way is to use ADO.NET and a SqlDataReader .
A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. Big data solutions typically involve one or more of the following types of workload: Batch processing of big data sources at rest.
Since you asked about how big companies (generally) approaches this:
They usually have a dba(database administrator) who lives and breathes the database the company uses.
This means they have people that know everything from how to design the tables optimally, profile and tune the queries/indexes/OS/server to knowing what firmware revision of the RAID controller that can cause problems for the database.
You don't talk much about what kind of tuning you've done, e.g.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With