Database Design Guidelines For Developers

One of the developers that I work with asked me to write a “brief” (really brief) guide on database design. What I have written below is a summary in bullet point form of best practices when creating databases, tables, indexes, and then querying the data within. It is designed to be short and sweet so the reader can check each section quickly when creating a new database. References are provided so that the reader can use if they wish to explore a certain topic in more detail.

Let me know what you think! Is there anything glaringly obvious that I’ve missed? My email is dbafromthecold@gmail.com

Database Design Guidelines

Database Design

  • PRIMARY filegroup should be reserved for system objects
  • Secondary DATA filegroup should be set to default.
  • Minimum of two physical files mapped to each secondary filegroups
  • Data and Log files presized for a minimum of one year’s data inserts
  • Data and Log files should reside on separate drives
  • Denormalisation should only be considered in special circumstances. A properly normalised OLTP database will outperform a denormalised one
  • Recommended collation is Latin1_General_CI_AS

Table Design

  • Each table created within the database should have a Primary Key and a Clustered Index. These need not be the same column
  • GUID fields should not be used for clustered indexes even if used as table’s Primary Key
  • Composite keys should be avoided. Surrogate INT IDENTITY keys can be used as a replacement
  • Columns set to the smallest size possible – avoiding NVARCHAR(MAX), TEXT etc datatypes
  • Avoid Nullable columns. Nullable columns require extra space within the database to designate the fact that they are nullable
  • Use referential integrity – Foreign keys and unique constraints should be applied

MSDN CREATE TABLE Documentation – http://technet.microsoft.com/en-us/library/ms174979.aspx

Querying Data

  • Stored procedures should be used as much as possible, to allow query execution plans to be re-used
  • Do not use SELECT * – SELECT only the columns needed to keep the query’s memory usage as low as possible
  • As a standard, cursors should be avoided. Removal of RBAR (row-by-agonising-row) processing
  • SET NOCOUNT ON should be at the start of each SQL batch to reduce network traffic
  • Dynamic SQL should be executed using sp_executesql.
  • Do not repeatedly call functions within stored procedures, functions, batches and triggers.
  • Sargability (Search ARGument Able) – Confirm queries executed are able to seek on indexes in database
    • Avoid wildcard characters at the beginning of a word while searching using the LIKE keyword
    • Avoid searching using not equals operators (<> and NOT)
    • Avoid functions in WHERE clause on table columns
    • Avoid implicit conversions in WHERE clause (use CAST/CONVERT if necessary)
BAD:   SELECT ... WHERE Year(StartDate) = 2014
FIXED: SELECT ... WHERE StartDate &gt;= '01-01-2014' AND StartDate &lt; '01-01-2015'

BAD:   SELECT ... WHERE isNull(Surname,'Williams') = 'Williams'
FIXED: SELECT ... WHERE ((Surname = 'Williams') OR (Surname IS NULL))

BAD:   SELECT ... WHERE LEFT(FirstName,1) = 'P'
FIXED: SELECT ... WHERE FirstName LIKE 'P%'

BAD:   SELECT ... WHERE DateDiff(mm,OrderDate,GetDate()) &gt;= 30
FIXED: SELECT ... WHERE OrderDate &lt; DateAdd(mm,-30,GetDate())

Sargability References

Stack Overflow – What makes a SQL Statement sargable? http://stackoverflow.com/questions/799584/what-makes-a-sql-statement-sargable
Rob Farley – Sargable Functions http://blogs.msmvps.com/robfarley/2010/01/21/sargable-functions-in-sql-server/
Brent Ozar – Why %string% is slow http://www.brentozar.com/archive/2010/06/sargable-why-string-is-slow/

Indexes

  •  Naming convention – Prefix_TableName_ColumnName
  •  Prefix – PK for Primary Keys, IX for non-clustered indexes, UC for Unique Indexes
  • Make sure the option ONLINE = ON is in place
  • Correct nonclustered indexes created for incoming queries – avoid RID and KEY lookups on tables
  • No Indexes on columns like “Gender” as this would only have a maximum of three possible entries (Male, Female & NULL) making an index ineffective. Nonclustered indexes must be as selective as possible

 

8 thoughts on “Database Design Guidelines For Developers

  1. Fantastic – it’s good to define standards – Building without an architecture is like doing a jigsaw puzzle on your lap!!

    However, I do have one comment re “Denormalisation should only be considered in special circumstances”… Denormalisation in the OLTP system should be avoided, but in the reporting and/or Data Warehouse it is a necessary evil, as the design of OLAP is focused on the end-user and their experience in the data – you want to make it easy for the end-user to query the data without having to worry about joins. Hence many developers shriek when they see the star-schema designs, and have to fight the urge to normalise the data again.

    Knowing the difference between OLTP and OLAP is a vital skill for developer, and allows them to build the right solution in the right place with the right design.

  2. Very interesting. I’m also a DBA (since 1986); here are my thoughts on this:

    “Data and Log files should reside on separate drives”
    Absolutely, 100% correct. But scary in a *developers* guidelines, since developers should *not* be the ones designating disk drives.

    “Composite keys should be avoided. Surrogate INT IDENTITY keys can be used as a replacement”
    100% wrong. The biggest myth in physical db design. The proper clustered index is the single most important rule for performance and should be carefully chosen based on actual table needs, not a predetermined “rule”.

    “Avoid Nullable columns. Nullable columns require extra space within the database to designate the fact that they are nullable”
    It takes 1 additional *bit* per nullable column; thus, the extra space is inconsequential. If you need NULL, use it.

    “Do not repeatedly call functions within stored procedures, functions, batches and triggers.”
    Is this intended to mean don’t repeatedly call a function with the same parameters? Or to avoid functions altogether where possible? Either would have some merit, although you can’t generalize too much.

    “Derived tables should be used wherever possible”
    This is not necessarily true. The author of the underlying article referenced has some misconceptions about SQL Server. In particular: “The biggest benefit of using derived tables over using temporary tables is that they require fewer steps, and everything happens in memory instead of a combination of memory and disk.” Both derived tables and temp tables stay in memory only they fit there. SQL Server has continued to improve since the article was written.

    • Hey Scott, thanks for your response. I’ve responded below:-

      “Data and Log files should reside on separate drives”
      Absolutely, 100% correct. But scary in a *developers* guidelines, since developers should *not* be the ones designating disk drives.

      –I’ve included this as I want to get in into their mindset of not having all database files on one drive. The DBAs still designate the drives.

      “Composite keys should be avoided. Surrogate INT IDENTITY keys can be used as a replacement”
      100% wrong. The biggest myth in physical db design. The proper clustered index is the single most important rule for performance and should be carefully chosen based on actual table needs, not a predetermined “rule”.

      –What about two large NVARCHAR columns as a composite key? Would this be preferable to an INT IDENTITY column as the clustered index?
      –Agreed that this depends on the table but this was my way of getting developers to consider surrogate INT keys

      “Avoid Nullable columns. Nullable columns require extra space within the database to designate the fact that they are nullable”
      It takes 1 additional *bit* per nullable column; thus, the extra space is inconsequential. If you need NULL, use it.

      –I thought that SQL Server keeps a special bitmap per row to determine which columns are actually NULL. This then has to be decoded
      for each row accessed.

      “Do not repeatedly call functions within stored procedures, functions, batches and triggers.”
      Is this intended to mean don’t repeatedly call a function with the same parameters?
      Or to avoid functions altogether where possible? Either would have some merit, although you can’t generalize too much.

      –I’ve seen code where functions are being executed for each row in a cursor which has caused really poor performance

      “Derived tables should be used wherever possible”
      This is not necessarily true. The author of the underlying article referenced has some misconceptions about SQL Server.
      In particular: “The biggest benefit of using derived tables over using temporary tables is that they require fewer steps,
      and everything happens in memory instead of a combination of memory and disk.” Both derived tables and temp tables stay in memory only they fit there.
      SQL Server has continued to improve since the article was written.

      –I’ll remove/re-write this section.

      Thank you!

      Andrew

  3. I agree with Scott Pletcher. In the real world unknown or missing data is a fact of life and NULL should be used. I also find mutli-column keys to be beneficial and as one database expert I know said, “Never use a surrogate key where a natural key exists.” I have seen more than once in the last 20 years where blind obedience to design rules such as assigning a surrogate key to every table resulted in systems that did not perform well under load. Good design requires the intelligent application of available features. Still your overall list of rules is pretty good where SQL Server is the target db.

  4. “Avoid Nullable columns. Nullable columns require extra space within the database to designate the fact that they are nullable”

    To build on the previous conversation about this:
    The NULL Bitmap is ALWAYS present in data records, even if all columns in the table are declared NOT NULL.

    Sources:
    http://www.sqlskills.com/blogs/paul/a-sql-server-dba-myth-a-day-630-three-null-bitmap-myths/
    &
    SQL Server Internals 2012 – Kalen Delaney

    “Derived tables should be used wherever possible (SELECT statements within SELECT statements)
    Article on replacing temporary tables with derived tables to improve query performance http://www.sql-server-performance.com/2002/derived-temp-tables/

    This directive ignores the positives of using temporary tables, such as being able to generate & use statistics on intermediate resultsets. Cardinality estimates can really burn you here. It also bothers me that the source article is dated from 2002!

  5. I’ve thought of one I think needs to be added:
    For character/nchar values <= 3 characters, use char() not varchar() (2 extra bytes are required for every varchar column). If a column is 5-9 bytes and (almost) always used, consider char() as well; for example, zip code or SSN.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s