Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Database design for very large amount of data

I am working on a project involving large amount of data from the delicious website. The data available is "Date, UserId, Url, Tags" (for each bookmark).

I normalized my database to a 3NF, and because of the nature of the queries that we wanted to use in combination, I came down to 6 tables... The design looks fine, however, now that a large amount of data is in the database, most of the queries need to join at least 2 tables together to get the answer, sometimes 3 or 4. At first, we didn't have any performance issues, because for testing matters we had not added too much data to the database. Now that we have a lot of data, simply joining extremely large tables takes a lot of time and for our project, which has to be real-time, this is a disaster.

I was wondering how big companies solve these issues. Looks like normalizing tables just adds complexity, but how does the big company handle large amounts of data in their databases, don't they use normalization?

Thanks.

like image 824
Hossein Avatar asked Apr 09 '10 23:04

Hossein


People also ask

Which database is best for large amount of data?

MongoDB is also considered to be the best database for large amounts of text and the best database for large data.

What is the optimal way of getting a large number of data from the database?

I think, the best way is to use ADO.NET and a SqlDataReader .

What is big data database design?

A big data architecture is designed to handle the ingestion, processing, and analysis of data that is too large or complex for traditional database systems. Big data solutions typically involve one or more of the following types of workload: Batch processing of big data sources at rest.


1 Answers

Since you asked about how big companies (generally) approaches this:

They usually have a dba(database administrator) who lives and breathes the database the company uses.

This means they have people that know everything from how to design the tables optimally, profile and tune the queries/indexes/OS/server to knowing what firmware revision of the RAID controller that can cause problems for the database.

You don't talk much about what kind of tuning you've done, e.g.

  • Are you using MyISAM or InnoDB tables ? Their performance(and not the least their features) is radically different for different workloads.
  • Are the tables properly indexed according to the queries you run ?
  • run EXPLAIN on all your queries - which will help you identify keys that could be added/removed, wether the proper keys are selected, compare queries(SQL leaves you with lots of way to accomplish the same things)
  • Have you tuned the query-cache ? For some workloads the query cache(default on) can cause considerable slowdown.
  • How much memory do your box have , and is mysql tuned to take advantage of this ?
  • Do you use a file system and raid setup geared towards the database ?
  • Sometimes a little de-normalization is needed.
  • Different database products will have different charasteristics, MySQL might be blazingly fast for some worlkoads, and slow for others.
like image 186
leeeroy Avatar answered Oct 14 '22 06:10

leeeroy