Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

NVARCHAR(?) for Email addresses in SQL Server

For Email addresses, how much space should I give the columns in SQL Server.

I found this definition on Wikipedia:

http://en.wikipedia.org/wiki/Email_address

The format of email addresses is local-part@domain where the local-part may be up to 64 characters long and the domain name may have a maximum of 253 characters - but the maximum 256 characters length of a forward or reverse path restricts the entire email address to be no more than 254 characters

And this one:

http://askville.amazon.com/maximum-length-allowed-email-address/AnswerViewer.do?requestId=1166932

So for now, total characters allowed for e-mail address is 64 (local part) + 1("@" sign) + 255 (domain part) = 320

It's possible that in the future they'll increase the local-part limit to 128 characters. which would make total of 384 characters.

Any thoughts?

like image 480
tugberk Avatar asked Feb 15 '12 14:02

tugberk


People also ask

How many characters is nvarchar 255?

nvarchar(255) (in SQL Server) stores 255 Unicode characters (in 510 bytes plus overhead).

How do I get nvarchar length in SQL Server?

But if the storage length is desired, you can drag the table name into the query window using SSMS, highlight it, and use 'Alt-F1' to see the defined lengths of each column. If you insert ASCII characters into the varchar and nvarchar fields, it will allow you to put 10 characters into all of them.

What is the data type for email address?

Email address is a standard Experience Data Model (XDM) data type that describes the details of an email address.

Can I use varchar for email?

2. you can use varchar as your data type for email column as emails are usually composed of letters, numbers and special characters. The right value of data length for the email field is database-agnostic.


1 Answers

I've always used 320 based on your latter calculation. It doesn't cost you anything to allow more*, unless people abuse it and stuff junk in there. It could cost you to allow less, as you'll have a frustrating users if they have legitimately longer e-mail addresses and now you'll have to go back and update schema, code, parameters etc. In the system I used to work with (an e-mail service provider), the longest e-mail address I came across naturally was about 120 characters - and it was clear they were just making a long e-mail address for grins.

* Not strictly true, since memory grant estimates are based on the assumption that varying-width columns are half-populated, so a wider column storing the same data can have lead to vastly different performance characteristics of certain queries.

And I've debated whether NVARCHAR is necessary for e-mail address. I've yet to come across an e-mail address with Unicode characters - I know the standard supports them, but so many existing systems do not, it would be pretty frustrating if that was your e-mail address.

And while it's true that NVARCHAR costs double the space, with SQL Server 2008 R2 you can benefit from Unicode compression, which basically treats all non-Unicode characters in an NVARCHAR column as ASCII, so you get those extra bytes back. Of course compression is only available in Enterprise+...

Another way to reduce space requirements is to use a central lookup table for all observed domain names, and store LocalPart and DomainID with the user, and store each unique domain name only once. Yes this makes for more cumbersome programming, but if you have 80,000 hotmail.com addresses, the cost is 80,0000 x 4 bytes instead of 80,000 x 11 bytes (or less with compression). If storage or I/O is your bottleneck, and not CPU, this is definitely an option worth investigating.

I wrote about this here:

  • Storing E-mail addresses more efficiently in SQL Server
like image 74
Aaron Bertrand Avatar answered Oct 11 '22 13:10

Aaron Bertrand