Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How should I design the database structure for this problem?

I am rebuilding the background system of a site with a lot of traffic.

This is the core of the application and the way I build this part of the database is critical for a big chunk of code and upcoming work. The system described below will have to run millions of times each day. I would appreciate any input on the issue.

The background is that a user can add what he or she has been eating during the day.

Simplified, the process is more or less this:

  1. The user arrives to the site and the site lists his/her choices for the day (if entered before as the steps below describes).
  2. The user can add a meal (consisting of 1 to unlimited different items of food and their quantity). The meal is added through a search field and is organized in different types (like 'Breakfast', 'Lunch').
  3. During the meal building process a list of the most commonly used food items (primarily by this user, but secondly also by all users) will be shown for quick selection.
  4. The meals will be stored in a FoodLog table that consists of something like this: id, user_id, date, type, food_data.

What I currently have is a huge database with food items from which the search will be performed. The food items are stored with information on both the common name (like "pork cutlets") and on producer (like "coca cola"), along with other detailed information needed.

Question summary:

My problem is that I do not know the best way to store the data for it to be easily accessible in the way I need it and without the database going out of hand.

Consider 1 million users adding 1 to 7 meals each day. To store each food item for each meal, each day and each user would potentially create (1*avg_num_meals*avg_num_food_items) million rows each day.

Storing the data in some compressed way (like the food_data is an json_encoded string), would lessen the amount of rows significally, but at the same time making it hard to create the 'most used food items'-list and other statistics on the fly.

Should the table be split into several tables? If this is the case, how would they interact?

The site is currently hosted on a mid-range CDN and is using a LAMP (Linux, Apache, MySQL, PHP) backbone.

like image 973
Mattis Avatar asked Dec 05 '22 21:12

Mattis


2 Answers

Roughly, you want a fully normalized data structure for this. You want to have one table for Users, one table for Meals (one entry per meal, with a reference to User; you probably also want to have a time / date of the meal in this table), and a table for MealItems, which is simply an association table between Meal and the Food Items table.

So when a User comes in and creates an account, you make an entry in the Users table. When a user reports a Meal they've eaten, you create a record in the Meals table, and a record in the MealItems table for every item they reported.

This structure makes it straightforward to have a variable number of items with every meal, without wasting a lot of space. You can determine the representation of items in meals with a relatively simple query, as well as determining just what the total set of items any one user has consumed in any given timespan.

This normalized table structure will support a VERY large number of records and support a large number of queries against the database.

like image 153
Paul Sonier Avatar answered Dec 10 '22 10:12

Paul Sonier


First,

Storing the data in some compressed way (like the food_data is an json_encoded string)

is not a recommended idea. This will cause you countless headaches in the future as new requirements are added.

You should definitely have a few tables here.

Users
id, etc

Food Items
id, name, description, etc

Meals
id, user_id, category, etc

Meal Items
id, food_item_id, meal_id

The Meal Items would tie the Meals to the Food Items using ids. The Meals would be tied to Users using ids. This makes it simple to use joins in order to get detailed lists of data- totals, averages, etc. If the fields are properly indexed, this should be a great model to support a large number of records.

like image 26
dqhendricks Avatar answered Dec 10 '22 10:12

dqhendricks