Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Social web application database design: how can I improve this schema?

Background

I am developing a social web app for poets and writers, allowing them to share their poetry, gather feedback, and communicate with other poets. I have very little formal training in database design, but I have been reading books, SO, and online DB design resources in an attempt to ensure performance and scalability without over-engineering.

The database is MySQL, and the application is written in PHP. I'm not sure yet whether we will be using an ORM library or writing SQL queries from scratch in the app. Other than the web application, Solr search server and maybe some messaging client will interact with the database.

Current Needs

The schema I have thrown together below represents the primary components of the first version of the website. Initially, users can register for the site and do any of the following:

  • Create and modify profile details and account settings
  • Post, tag and categorize their writing
  • Read, comment on and "favorite" other users' posts
  • "Follow" other users to get notifications of their activity
  • Search and browse content and get suggested posts/users (though we will be using the Solr search server to index DB data and run these type of queries)

Schema

Here is what I came up with on MySQL Workbench for the initial site. I'm still a little fuzzy on some relational databasey things, so go easy.

Schema Image

Questions

  1. In general, is there anything I'm doing wrong or can improve upon?
  2. Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?
  3. Is there any reason why I shouldn't combine the PostStats table into the Posts table?
  4. Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?
  5. Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?
  6. Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?

Thanks for the help!

like image 821
Arjun Avatar asked Oct 02 '10 05:10

Arjun


1 Answers

In general, is there anything I'm doing wrong or can improve upon?

Overall, I don't see any big flaws in your current setup or schema.

What I'm wonderng is your split into 3 User* tables. I get what you want your intendtion was (having different user-related things seperate) but I don't know if I would go with the exact same thing. If you plan on displaying only data from the User table on the site, this is fine, since the other info is not needed multiple times on the same page but if users need to use their real name and display their real name (like John Doe instead of doe55) than this will slow down things when the data gets bigger since you may require joins. Having the Preferences seperate seems like a personal choice. I have no argument in favor of nor against it.

Your many-to-many tables would not need an addtional PK (e.g PostFavoriteID). A combined primary of both PostID and UserID would be enough since PostFavoriteID is never used anywhere else. This goes for all join tables

Is there any reason why I shouldn't combine the ExternalAccounts table into the UserProfiles table?

As withe the prev. answer, I don't see a advatanage or disadvantage. I may put both in the same table since the NULL (or maybe better -1) values would not bother me.

Is there any reason why I shouldn't combine the PostStats table into the Posts table?

I would put them into the same table using a trigger to handle the increment of the ViewCount table

Should I expand the design to include the features we are doing in the second version just to ensure that the initial schema can support it?

You are using a normalsied schema so any additions can be done at any time.

Is there anything I can do to optimize the DB design for Solr indexing/performance/whatever?

Can't tell you, haven't done it yet but I know that Solr is very powerfull and flexible so I think you should be doing fine.

Should I be using more natural primary keys, like Username instead of UserID, or zip/area code instead of a surrogate LocationID in the Locations table?

There are many threads here on SO discussing this. Personally, I like a surrogate key better (or another unique number key if available) since it makes queries more easier and faster since an int is looked up easier. If you allow a change of username/email/whatever-your-PK-is than there are massive updates required. With the surrogate key, you don't need to bother.

What I would also do is to add things like created_at, last_accessed at (best done via triggers or procedures IMO) to have some stats already available. This can realy give you valuable stats

Further strategies to increate the performance would be things like memcache, counter cache, partitioned tables,... Such things can be discussed when you are really overrun by users because there may be things/technologies/techniques/... that are very specific to your problem.

like image 125
DrColossos Avatar answered Sep 28 '22 17:09

DrColossos