Creating a database is easy.
Designing an effective database, the right way, using good practices, is hard.
In this article, I’ve listed 24 different database design mistakes that you should try to avoid.
Mistake 1: Ignoring the Purpose or Frequency of the Data
When you design a database, you’re designing it to ensure it meets the needs of the business and the system that uses it.
Data is both stored and retrieved from the database, and it will help you to know:
- What the data is used for
- How often new data is inserted
- How often the data is updated
- How often the data is retrieved from the database
- How much data is expected to be stored
Each of these should be considered when designing your database.
If you’re designing a database for a data warehouse, then you’ll have a different design for a system that handles regular database transactions. A data warehouse will have a high number data retrieval operations and less inserts and updates, whereas a database that handles transactions will have a high number of inserts and a lower number of reads.
So, as a database designer, how can you find out how the data will be used?
Ask your team. If there’s a business analyst on the team, ask them. Ask the other developers. Ask the project manager or scrum master or product owner.
Mistake 2: Poor Normalisation
Normalisation is a process that transforms a rough idea of tables and columns into a database design that follows a specific set of rules, which aims to be efficient and eliminate redundancy.
When designing a database, you should follow the rules and process of normalisation (unless you’re designing a data warehouse, then you’ll have a different set of rules to follow).
Normalising isn’t something that everybody does in the same way. Different developers may apply the rules of normalisation and use their own experience and understanding of the data and come up with different database designs.
But, as long as you follow the rules of normalisation, your database should be well designed.
If not, you could end up with issues with the database, such as:
- Recording the same information twice
- Allowing the possibility of deleting data causing unintended data removals elsewhere
Your database should adhere to at least third normal form. If it doesn’t, spend the time to update the design.
Mistake 3: Having Redundant Data
This is a similar issue as normalisation, but you can still have a database that seems normalised but contains redundant data.
Redundant data is any data that is unnecessary or data that does not need to be stored.
One example of redundant data is a person’s current age. This is calculated based on a date of birth, and keeping the age up to date will require daily calculations on all values. A better way to determine an age would be to store a date of birth and have the age calculated by the system. This way the calculation is done elsewhere and not stored in the database.
Another example would be storing user information that is stored in other systems. For example, if you use a central account management system, which includes password validation, then you probably don’t need to store passwords in your database. This might meet the guidelines of third normal form but still means that redundant data is being stored.
Mistake 4: Multiple Pieces of Information in a Single Field
Another mistake that can be made is storing multiple pieces of information in a single field.
This is often something we do when we’re learning how to create databases and tables. However, as we get more experience and start working on systems that need to perform well, storing data in this way can be a problem.
One common example of this is an address field. Addresses could be stored for employees or customers in the system, such as “123 Main Street, Seattle, WA, USA, 54321”.
This address might seem like a single piece of information, but it actually includes several things:
- Street number
- Street name and type
- City
- State
- Country
- ZIP/postal code
I recommend storing the address in separate fields. A recent project I worked on was developing a CRM system, and they stored address data in separate fields like this, instead of a single field.
Why store data in separate fields?
It’s easier to find information. What if you wanted to find all the customers that were in Seattle? You would have to look inside this address field to find occurrences of “Seattle”, and performing a LIKE or a partial string match is an expensive operation when it comes to performance.
It could also make it hard to identify addresses where the city is Seattle. What if someone lives on “Seattle Rd”? Matching on just “Seattle” will also find that record, so you’ll have to add in more logic to exclude those.
Now, if you store the address data in different fields, it makes it easier to search and filter records. It also makes it easier to update data in the future.
Mistake 5: Using a Business Field for the Primary Key
When you design your tables, one thing you’ll need to define for each table is a primary key. A column or set of columns that identifies the record in a unique way.
Many tutorials and teachings say you should try to find a field that’s business-relevant, or useful to the users, as the primary key.
For example:
- Individual tax number, such as Social Security Number (USA) or Tax File Number (AUS)
- Company number, such as ACN (Australia)
- Part number or item number for eCommerce systems
However, I recommend creating an entirely new field to serve as the primary key.
Why?
Because the purpose of the primary key is different to the business field. The primary key should serve one purpose, and that is to uniquely identify the row. Other fields that are used by the users or the system have their own purpose.
Business fields may change in the future. For example, everyone has a social security number. But what if someone’s social security number changes? Or it gets reused? This tutorial on designing a database using OpenOffice explains the importance of primary keys (and many other tips on database design).
There’s a small chance this might happen. It’s also dependent on rules that are external to the system (e.g. the government, or other suppliers in the case of part numbers). If it does, then you’ll have issues with your data. The primary key will no longer be unique and you’ll have to re-align all of your data.
Creating a new field for the primary key is what I recommend, and what I mention in my SQL Best Practices post. Whether you call id “id” or “<table>_id” or something else is up to you. I mention this further down under the “naming conventions” section. This way, if the rules for social security numbers change, or if a supplier decides they are changing the rules for their part numbers, your system is mostly unaffected.
Mistake 6: Using a Composite Primary Key
Another mistake related to primary keys that I see is using a composite primary key.
Sure, this may not be a “mistake”, as it’s something that’s possible in database design, but it’s not something I recommend.
A composite primary key is a primary key that contains two or more columns. This is often done when the rules of your data determine that the combination of two (or more) fields is unique.
For example:
- Order ID and line number.
- Customer first name, last name, and date of birth
This is similar to the previous point of using business fields as the primary key.
It also has the same recommendation: create a new, dedicated field for the primary key.
The primary key field serves a single purpose of uniquely identifying the row within the system. As a result, it’s used in other tables as foreign keys.
Using a composite primary key means you’ll need to add two or three columns in these other tables to link back to this table, which isn’t as easy or efficient as using a single column.
If your business rules change and the composite key is no longer unique, or the format changes, then you’ll have all kinds of problems with the data.
You can still enforce the fact that the combination of these columns should be unique by creating a unique constraint on these tables. It just means that these columns are not the identifier of the row.
Mistake 7: Poor Referential Integrity
Referential integrity means ensuring the data is high quality and meets the needs of the system. This is best implemented using database constraints.
In Oracle, there are five constraints that can be implemented:
- Primary key constraint
- Foreign key constraint
- Unique constraint
- Not null constraint
- Check constraint
If you have specific rules that should be enforced on your data, then it’s a good idea to implement this on the tables.
But can’t you just leave it up to the user interface? Yes, this is possible. For example, to ensure a date of birth is always entered, the field on the screen can be mandatory and there can be a validation there.
However, if it’s a rule on the data, like the date of birth that needs to be entered, then it should go on the table as a constraint.
It’s good practice to place the validation as close to the data as possible. This means that any process that adds or updates the data will need to adhere to these rules – all systems, data load processes, scripts run by developers.
It’s simple to add constraints to tables – it’s often done with a single command. Adding screen validation is harder. Not difficult, but it’s a bit more code, and prone to failing tests.
Adding constraints to the database will also enforce data integrity when applications are updated or changed.
Mistake 8: Poor Indexing
Indexes on a database are objects that allow certain queries to run more efficiently.
They’re not a silver bullet for performance – they don’t solve every issue.
There are two mistakes that are often done when it comes to indexes:
- No indexes at all
- Too many indexes, or indexes on every field
Not having indexes at all is a missed opportunity. A database that’s used by an application will often have many queries that read data from the table. These queries are likely to join to other tables, use filters, and ordering. If so, then these queries are good candidates for having some of their fields indexed. Without the index, as the table grows in size with more data, then the queries will likely get slower and slower.
Having too many indexes can also be an issue for databases. Indexes help with the reading of data from a table, but slow down the process of adding and updating data, because the indexes need to be updated each time.
While it might be tempting to add indexes to every field of the table, doing so can slow down your database.
There’s a healthy middle ground that we should be aiming for. I’ve written a guide to creating indexes which you can read here, but in short, if a query joins on a field or the field is used in a WHERE clause, then consider creating an index on it.
Mistake 9: Poor or Inconsistent Naming Conventions
When you design tables and their columns, you have quite a bit of freedom when giving them names. Sure, there are limits to the length of their name (30 characters in Oracle), but you can name them whatever you like.
This can cause issues with the design. It can cause column names and table names to be inconsistent.
I recommend sticking with a standard naming convention when you name your tables, columns, and other objects on the database.
For example, if you name one of your tables using the singular form of the word (e.g. “customer”), then use the singular form for all database tables (“order”, “employee”, “product”). Don’t mix and match plural table names (“orders”, “employee”, “products”).
If the names are inconsistent, it makes it harder to understand the design, and hard for developers who work on the tables, as they won’t be able to remember which format is being used.
The same rule applies to columns, such as:
- “id” or “<table>_id”
- “account_no” or “account_number”
- “first_name” or “fname”
Pick a standard and stick with it.
Personally, I prefer singular table names. But, if you disagree or if you’re using a different format, that’s OK. Just make sure it’s consistent across your database.
Mistake 10: Using Spaces or Quotes in Table Names
Another mistake that can be made when designing your database is the characters in the names itself.
Most database systems allow you to have spaces in your table names, or to use quotes to surround your table names.
This can mean you have table names or field names such as:
- “customer order”
- “Products”
- “First_Name”
- “account number”
While this is possible, I strongly advise against it.
Adding spaces or quotes to your name makes it harder to query the table. It goes against the naming consistency that you should be following, as mentioned above.
Using quotes means that the name of the table is stored in the case you have defined it (e.g. Products), making it harder to query the table, because you need to add quotes to your queries each time:
SELECT id, name
FROM “Products”;
Also, adding spaces is unnecessary and makes it harder to query the table. You’ll also need to add quotes every time
SELECT id, “First Name”
FROM customer;
So what should you do instead?
Name your tables and columns without quotes. They aren’t needed and should be avoided.
Also, use underscores instead of spaces (e.g. first_name or account_no). Or avoid them altogether (e.g. firstname or accountno).
Mistake 11: Using Reserved Words for Table and Column Names
As I’ve mentioned above, you have a bit of freedom when choosing the names for your tables and columns.
As long as you avoid quotes, spaces, and follow a naming convention, you should be OK, right?
Well, there’s another mistake you should avoid.
In all database systems, there is a range of words that are used by the system. They could be keywords for data types or used in queries.
You should avoid using these words as table names or column names, because they may cause confusion with users and developers, and may even cause queries not to run.
Some of these reserved words are:
- Order
- User
- Table
- Primary
- Where
- Group
You should choose different words if you want to use these for your table names or column names.
You can use them in combination with other words or characters. For example, you can have a field called order_id, because that word is not reserved. But calling a column “order” may cause issues with your queries.
The good news is that in many IDEs these keywords are highlighted or bolded or shown in a different colour. So, while you may not notice it as you design your database, when it comes time to create the tables, it should be noticeable.
Mistake 12: Columns Are Too Small or Too Large
Part of designing your database is determining the size of the columns that the data is stored in.
Some data types, such as dates, don’t need to store a size, because the size is already defined.
However, some data types, such as character and number types, need to have a size.
A mistake that is sometimes made is creating a table with a column that is either too large or too small.
A column can be too large if it allows for a lot more data that is needed.
For example, Oracle allows you to create a column as a VARCHAR2 with a 4,000 character limit.
But just because you can have that limit, doesn’t mean you should.
The size of a column should be determined by the data it needs to store. And a column of that size indicates that it needs to store a large text value, such as a notes or comments field. It allows for a lot of flexibility for applications that use this column, which can be a bad thing if you want to move this data to another database or system.
For example, say you have this VARCHAR2(4000) column. It’s only supposed to store a person’s first name. But it allows for 4,000 characters. Then, you need to extract this data into a data warehouse. If the field allows for 4,000 characters, then your data warehouse and all of the steps in between will need to allow for this length as well – even if the largest value is 100 characters long.
A similar issue can happen with a field that is too small. If you declare a VARCHAR2 field with only 10 characters, and one day you need to store 15, then there will be an issue. It’s hard to know how long a text field needs to be without being too large.
So, to work out the correct size of a field, ask what the field will be used for. Ask for some sample data if you can, and work out the length of it. Ask if it’s likely the values will get any bigger. Sometimes the field can follow a standard (e.g. states in the USA) or perhaps it can be a reasonable length with a bit extra (e.g. country names).
Mistake 13: Using the Wrong Data Types
When you design a database, there comes a point where you need to determine what data types each column should have.
Generally, there are three types: number, text, and date.
But within those types, there are many more. Oracle has many data types, and other databases have different but similar types.
Choosing the right type comes down to your knowledge of the data that’s being stored and your experience and knowledge of the database data types.
Knowing what data is being stored will help you determine what data type to use. For example, are account numbers always numeric? Then store them as a number field. If not, then store it as some kind of character value.
Storing data in the wrong format can cause issues with storing the data and retrieving it later.
For example, phone numbers should be stored as a text value. This is because they can often have a leading zero, which is removed when it is stored as a number. Also, there is no need to perform arithmetic on a phone number, so it doesn’t need to be stored as a number.
Mistake 14: Not Considering Time Zones
Date storage and processing in databases can be powerful. But it can also get a bit complicated.
All database systems allow for some kind of date or datetime data to be stored. This is great for simple dates that need to be stored such as date of birth.
There are also some other data types that allow for greater precision, down to fractional seconds, which is great for log file entries.
But, you should find out from your team if timezones need to be considered.
Will your database need to be accessed and used by people in different time zones?
If so, you may need to consider using data types that support time zones.
There are a few of these data types in Oracle, and other database systems, that make processing dates and times in different time zones easier and efficient.
So, find out if you need to support time zones, and use applicable data types if you do. Oracle has plenty of timezone functions to help you work with these data types as well.
Mistake 15: Not Considering Audit Tables
A common request from business users when designing a system is to record what changes are made, by who, and when they are made.
This might not be something that they identify immediately, or even know that they want until they see data being changed.
But it’s a good question to ask the business users.
If they have a need or a want to see when data is changed, then you may need to build an audit table. These are similar to the base table that is modified, with some extra information on it.
These often capture:
- The primary key of the record that was changed
- All of the data at the time before it was changed
- Which user made the change
- When the change was made (date and time)
The details of this approach depend on how you want to design the system. One way is to add a record to the audit table whenever the main table’s record is changed, and the old data is added to the audit table. That way you can compare the current record in the main table to each of the audit records.
Mistake 16: Using Hard Deletes Instead Of Soft Deletes
A common feature of systems is the ability to delete records. For example, some things that are often deleted are:
- Inactive users
- Old products
- Old metadata (statuses, categories)
However, it’s often a good idea to use a “soft delete” instead of a “hard delete”.
A hard delete is where you delete data from the database tables, by using a DELETE statement. The data is removed and is hard to recover.
DELETE FROM table
WHERE id = 51;
A soft delete is where the data still exists in the database, but is not visible or active in the system that uses it. One way to do this is to use an “active” flag, and have users “deactivate” records. This way, the data is still in the system in case it is related to old records (e.g. historic orders that refer to old products are not ruined), and can be re-activated if needed.
UPDATE table
SET active = 0
WHERE id = 51;
It can be up to the system how to display it, if at all.
So, it’s usually better to use soft deletes instead of hard deletes.
Mistake 17: Not Taking Advantage of Database Features
There are a lot of features in your database system (such as Oracle) that go beyond just creating tables and storing data.
If you don’t consider them and take advantage of them, then the job of a developer is quite a bit harder.
Some examples are:
- Using check constraints to enforce values for a field
- Using a sequence to generate primary key values
- Use user roles and privileges to enforce access
- Create views to limit access and visibility of data if required
While some of these relate to the development work that will be done on the database, it’s good to be aware of them while designing them.
Mistake 18: Lack of Documentation
Documenting your database can be very helpful for your project team and anyone who needs to work with it in the future.
Have you ever tried to work with a database, and had trouble working out the tables that were used and how they were related? Or what the stored procedures were supposed to do?
Documenting your database will prevent this.
So, create an ERD of your database, and keep it up to date. Add comments to your code in functions or stored procedures.
You will thank yourself later when you look at your database again and try to work out how it was designed and what it does. Other developers and team members will also be grateful when they work on the database.
Mistake 19: One Table for All Domain Values (Entity Key Value)
Another common mistake (which I’ve also made before) is the concept of a table to hold all similar values.
Let’s say you have a range of status values for different areas. You have an account status, an order status, and a payment status. These all have different values to each other (e.g. account statuses don’t apply to order statuses).
You might look at those and think, “Hey, these are all status values, why don’t I store them all in the same table, with a column that identifies which table they relate to or what type they are?”
This would mean a single table like this:
- Entity or table (e.g. account, order, payment)
- Key (1, 2, 3, a unique value for the entity or even the whole table)
- Value (Draft, Pending, Paid, etc)
While this might seem simpler, as all data is in one table, it does cause some problems.
It doesn’t enforce referential integrity, as there is no easy way to ensure statuses that are relevant for a table are related to that table. We can’t relate the primary key of this to the account table, for example, and ensure that only account-related statuses are selected.
The table could also get quite large. And it would need to be filtered every time it is queried, based on the entity that is being searched, adding extra overhead.
While it takes a bit more work during setup, creating separate tables for similar data values like this is a better idea. You can easily relate them to the required tables. For example, an account_status, order_status, and payment_status table could be used. Or, depending on your data, they could be fields in an existing table.
For more information on one-to-many relationships, watch this YouTube video:
Mistake 20: Not Using Stored Procedures
The final database design mistake I see is avoiding stored procedures at all costs.
There’s a debate within the developer community about stored procedures. Some say they are unnecessary and all data updates and retrievals should be done using queries in the application. Others say that all data actions should be done close to the database so they can be controlled.
My advice is not to dismiss the idea of using stored procedures just because you have heard they are bad. They are useful in some situations, such as ensuring a calculation is performed consistently.
The team should decide whether or not to use stored procedures for some tasks. Oracle PL/SQL (and other database programming languages) have very powerful features for creating stored procedures, so if you can use them, I recommend them.
Mistake 21: Not Testing with Real Scenarios
The next mistake I’d like to mention is not testing your design with a real scenario or real data.
As you normalise your design, you might think that it’s correct and handles what the business needs.
But until you run a real scenario, or a few real scenarios through it, then it might not be complete.
I encountered this on a recent project I was on. We were designing a new system for telecommunications products, and it needed to handle all kinds of variations and optional extras that could be purchased.
One of the developers asked the person who designed the database if it could handle a specific complicated scenario. It made sense, and as the business analyst, I understood it. It involved a range of different products, each of which had different extras and different options and different allowances, and they needed to be able to add up all of the data and call allowances for these products.
The person who designed the database realised that their current design couldn’t handle that scenario. While they thought that it could handle the simple scenario, one that they thought was real, it wasn’t until an experienced developer or someone who knew the business area better, was able to run it through a real and more complicated scenario, that they realised their design wasn’t complete.
So, make sure you run a few different scenarios through your design to make sure it can cover everything it needs to. It may not need to cover every single possibility, it just needs to cover what your requirements say it needs to.
Mistake 22: Not Using Unique Constraints Where Needed
Consider that you need to create a customer table. You need to store details such as their name, address, and tax ID. You’ve determined that you are going to create an id field for their primary key, so your table looks like this:
You create a primary key constraint on the ID column, as that is your primary key. This means that every record should be unique, right?
This works well until you want to look up a customer from a tax ID. A tax ID in this system is the customer’s unique tax identification number, issued by the government. This could be a Social Security Number, Tax File Number, Australian Business Number, depending on where you’re based.
However, you find that there are three customers with the same tax ID. How is this possible? According to the government rules, these numbers should be unique. So how did this duplicate data get into the database if you have a unique primary key?
It happened because even though your primary key is unique, the other fields are not. So these three customers would have different id values, but the same tax_id values. This can cause issues in your database and in the application, as business rules aren’t being enforced.
The solution to this is to use unique constraints on columns that need to be unique. In this example, you would define a unique constraint on the tax_id column. This would mean your primary key would be a way of identifying the row in this table and used by foreign keys, and the tax_id would also be unique.
Related: How to Remove Duplicate Records in SQL.
Mistake 23: Allowing Two Possible Parents for a Record
An issue that can appear in some database designs is the ability to have two possible parents for a record.
One example of this is a product order for a company. These orders can be placed by customers, but can also be placed by employees of the company who want to order the product as well.
An ERD could look like this:
It’s not possible in SQL to create a foreign key that refers to one or more tables, as shown in the diagram above. An alternative would be to create two columns that refer to each of the tables:
This may work. But how do you ensure that only one of them is populated? An order can’t be for a customer and an employee.
You could try to add a record into the customer table that represents an employee, but then you’ll be entering data twice. Let’s assume that we don’t want to do that.
The solution to this issue is to create two joining tables. This will allow you to capture orders placed by a customer and orders placed by an employee, without breaking any referential integrity issues. You would have a customer_orders table that links customers to orders, and an emplyoee_orders table that links employees and orders.
An ERD would look like this:
This works because the order data is still captured, there is no issue of a dual foreign key. It is still possible for an order to be associated with both a customer and an employee, but at least the referential integrity is preserved.
Mistake 24: Storing Different Optional Types of Data in Columns
One database design issue I have seen is storing different optional values in different columns in a table. A common example of this is phone numbers.
Let’s say you set up a table that needs phone numbers, such as customers. There may be several different phone numbers to capture:
- Home phone
- Business phone
- Mobile phone
- After hours phone
One way to store customer phone numbers is to create separate columns for these:
However, these fields may be all optional. A customer may not have a home phone but would have a business phone. They may only have a mobile phone. This would mean that some or all of these fields would be empty.
Also, what happens if you want to add a new type of phone number? Perhaps they have a second mobile number, or you want to store an assistant’s number. If they are stored as separate columns, you’ll have to add a new column to the table each time.
A better way to do this is to store the phone numbers and types in separate tables. You can have a table of phone number types, such as mobile and business. You can then have a joining table that stores phone numbers for each customer for each type. An ERD would look like this:
This would give you the flexibility to store only those numbers that are needed, as well as adding new types in the future if required.
Summary
Designing a database can be hard to do well. There’s a lot of questions to ask, and a lot of scenarios to imagine, to ensure you’re creating a database design and database that meets the needs of the business, performs well, and is easy to maintain. But if you avoid the mistakes mentioned in this article, you’ll be on your way to designing a great database.
What database design mistakes have you seen or made? Leave a comment below!
Your explanation of topics is always so clear and concise. The videos that you provide are extremely easy to follow and have helped me immensely with my new position as a database developer for a university. Thank you so much!
Thanks Emil, glad they have been helpful! If you have any suggestions for topics you want to learn, let me know here or on the contact form!
Hi Ben,
As always, you are expounding on issues many take for granted or not at all. Kudos to you sir.
I was just reviewing a scenario from a writer, using Oracle 18c, regarding the DISTINCT keyword used in a composite query statement. In an attempt to solve the following query I learned a thing or two:
–WRITE A SQL QUERY TO GET THE NTH HIGHEST SALARY AMONG ALL EMPLOYEES
The employee table has five columns: ID, Dept, Empl_first_nm. Empl_last_nm, Salary
The original solution provided by the writer listed the following:
SELECT *
FROM employee e1
WHERE N-1 = (
SELECT COUNT(DISTINCT(e2.salary))
FROM employee e2
WHERE e2.salary > e1.salary
);
‘N’ refers to the ‘Nth’ highest salary level requested.
The query ran correctly as expected; however, upon modifying the data and PURPOSELY supplying a duplicate record, I found the query did NOT provide an accurate count.
It listed two records with identical data which surprised me , at first.
The distinct keyword in the inner correlated subquery was not negating the duplicate record.
It was not until I placed the DISTINCT keyword in the ‘outer’ query that it behaved as expected.
Also I did note that DISTINCT did provide correct outcome when applied to VALUE as opposed to ROW. This conflicts with your description in your book and I wanted to gain your insight regarding this issue.
I felt I should mention this as mistakes are bound to happen and it is so imperative to thoroughly review your work before submitting it for presentation.
Can you elaborate on the DISTINCT keyword when utilized in composite scenarios? I believe you are very qualified to explain this so I presented my experience hoping for a better understanding.
Thank you
Vic
Regarding Mistake 24 and assuming that the phone_type field phone types are Home, Business and Mobile , how does your ERD account for a Customer with two mobile phone numbers?
Good question Marcus! You could insert a second record into the customer_phone table with the same customer_id, same phone_type_id, and a different phone number. You would then have the two mobile phone numbers for the same customer.