Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

web application user table primary key: surrogate key vs username vs email vs customer Id

I am trying to design an ecommerce web application in MySQL and I am having problems choosing the correct primary keys for the user table. the example given is just a sample example for illustration.

enter image description here

user table have following definition

CREATE  TABLE IF NOT EXISTS `mydb`.`user` (
  `id` INT NOT NULL ,
  `username` VARCHAR(25) NOT NULL ,
  `email` VARCHAR(25) NOT NULL ,
  `external_customer_id` INT NOT NULL ,
  `subscription_end_date` DATETIME NULL ,
  `column_1` VARCHAR(45) NULL ,
  `column_2` VARCHAR(45) NULL ,
  `colum_3` VARCHAR(45) NULL ,
  PRIMARY KEY (`id`) ,
  UNIQUE INDEX `username_UNIQUE` (`username` ASC) ,
  UNIQUE INDEX `email_UNIQUE` (`email` ASC) ,
  UNIQUE INDEX `customer_id_UNIQUE` (`external_customer_id` ASC) )
ENGINE = InnoDB

I am facing following issues with the primary key candidate columns:

Id column

Pros

  • No business meaning (stable primary key)
  • faster table joins
  • compacter index

cons

  • not a "natural" key
  • All attribute table(s) must be joined with the "master" user table, thus non-joining direct queries are not possible
  • causes less "natural" SQL queries
  • Leaks information: a user can figure out the number of registered user if start value is 0 (changing the start value sort this out) ii) A user register a profile as user_A at time_X and some time later as user_B at time_Y will be easily able to calculate the number of registered users over the time period ((Id for user_B) - (Id for user_A)/(time_Y - time_X))

email column

Pros

  • None

Cons

  • a user should be able to change the email address. Not suitable for primary key

username column

Pros

  • a "natural" primary key
  • Less table joins
  • simpler and more "natural" queries

Cons

  • varchar column is slower when joining tables
  • an index on a varchar column is less compact than int column index
  • very difficult to change username since foreign keys are dependent on the value. Solution: "Syncing" all foreign keys on application or not allowing a user to change the username, .e.g. a user should delete the profile a register new

external_customer column

pros

  • can be used as an external reference for a customer and holds no information (maybe non-editable username can be used instead? )

    cons

  • might leaks information if it is auto incremental (if possible)

  • problematic to generate unqiue value if an auto incremental surrogate id is already in use since MySQL innodb engine does not multiple auto_increment columns in same table

what are the common practice when choosing user table primary keys for a scalable ecommerce web application? all feedback appreciated

like image 676
infinityLoop Avatar asked Apr 01 '12 18:04

infinityLoop


2 Answers

I don't have anything to say about some of your analysis. If I've cut some of your pros or cons, that only means I don't think I have anything useful to add.

Id column

Pros

  • No business meaning (stable primary key)
  • faster table joins
  • compacter index

First, any column or set of columns declared NOT NULL UNIQUE has all the properties of a primary key. You can use any of them as the target for a foreign key reference, which is what all this is really about.

In your case, your structure allows 4 columns to be targets of a foreign key reference: id, username, email, and external_customer_id. You don't have to use the same one all the time. It might make sense to use id for 90% of your FK references, and email for 10% of them.

Stability doesn't have anything to do with whether a column has business meaning. Stability has to do with how often, and under what circumstances, a value might change. "Stable" doesn't mean "immutable" unless you're running Oracle. (Oracle can't do ON UPDATE CASCADE.)

Depending on your table structure and indexing, a natural key might perform faster. Natural keys make some joins unnecessary. I did tests before I built our production database. It's probably going to be decades before we reach the point that joins on ID numbers will outperform fewer joins and natural keys. I've written about those tests either on SO or on DBA.

You have three other unique indexes. (Good for you. I think at least 90% of the people who build a database don't get that right.) So it's not just that an index on an ID number is more compact than either of those three; it's also an additional index. (In this table.)

email column

Pros

  • None

An email address can be considered stable and unique. You can't stop people from sharing email addresses, regardless of whether it's the target for a foreign key reference.

But email addresses can be "lost". In the USA, most university students lose their *.edu email addresses with a year or so of graduation. If your email address comes through a domain that you're paying for, and you stop paying, the email address goes away. I imagine it's possible for email address like those to be given to new users. Whether that creates an unbearable burden is application-dependent.

Cons

  • a user should be able to change the email address. Not suitable for primary key

All values in a SQL database can be changed. It's only unsuitable if your environment doesn't let your dbms honor an ON UPDATE CASCADE declaration in a timely manner. My environment does. (But I run PostgreSQL on decent, unshared hardware.) YMMV.

username column

Pros

  • a "natural" primary key
  • Less table joins
  • simpler and more "natural" queries

Fewer joins is an important point. I have been on consulting gigs where I've seen the mindless use of ID numbers made people write queries with 40+ joins. Judicious use of natural keys eliminated up to 75% of them.

It's not important to always use surrogate keys as the target for your foreign keys (unless Oracle) or to always use natural keys as the target. It's important to think.

Cons

  • varchar column is slower when joining tables
  • an index on a varchar column is less compact than int column index

You can't really say that joining on a varchar() is slower without qualifying that claim. The fact is that, although most joins on varchar() are slower than joins on id numbers, they're not necessarily so slow that you can't use them. If a query takes 4ms with id numbers, and 6ms with varchar(), I don't think that's a good reason to disqualify the varchar(). Also, using a natural key will eliminate a lot of joins, so overall system response might be faster. (Other things being equal, 40 4ms joins will underperform 10 6ms joins.)

I can't recall any case in my database career (25+ years) where the width of an index was the deciding factor in choosing the target for a foreign key.

external_customer column

pros

  • can be used as an external reference for a customer and holds no information (maybe non-editable username can be used instead? )

There are actually few systems that let me change my username. Most will let me change my real name (I think), but not my username. I think an uneditable username is completely reasonable.

like image 118
Mike Sherrill 'Cat Recall' Avatar answered Oct 02 '22 02:10

Mike Sherrill 'Cat Recall'


In general, web applications try to keep their database schema away from the customer - including primary keys. I think you're conflating your schema design with authentication methods - there's nothing stopping you from allowing users to log in with their email address, even if your database design uses an integer to uniquely identify them.

Whenever I've designed systems like this, I've used an ID column - either integer or GUID for the primary key. It's fast, doesn't change due to pesky real life situations, and is a familiar idiom to developers.

I've then worked out the best authentication scheme for the app in hand - most people expect to login with their email address these days, so I'd stick with that. Of course, you could also let them login with their Facebook, Twitter, or Google accounts. Has nothing to do with my primary key, though...

like image 21
Neville Kuyt Avatar answered Oct 02 '22 03:10

Neville Kuyt