Software developers are one of the main groups of people who work with databases.
Recently, I asked a range of software developer blog owners to share their advice for working with databases.
I asked them all a single question:
If you could offer one piece of advice to developers on working with databases, what would it be?
Here are their answers.
I suggested they could provide as much or as little as they like, which is why some answers are shorter than others. All of the responses contain helpful advice for developers working with databases.
Decide early what is more important to you: the structure of the database or your code’s design. Otherwise, you will waste a lot of time, and one of them will suffer anyway.
Make sure you use an excellent layer of rate-limiting with any type of caching for databases that allow public access. I can’t overstate how many times only having a good rate-limiting security has saved me from thousands of dollars of AWS bills.
I wouldn’t really consider myself a database expert. But I would say the biggest advice I could give to someone is to learn about specifics of the engine they’re using and recognize its strengths and weaknesses. For example, just because it’s an SQL database, doesn’t mean it’s going to work the same as every other SQL database out there. Some people consider knowing how their database works an implementation detail and that it’s best to hide it behind a thick layer of abstraction so you can swap it out when you need, but I disagree. A database is not an implementation detail, but a very important and potentially defining aspect of your application.
Always make sure you understand what your code is actually doing when it interacts with the database. If you’re using an ORM, make sure you understand (and check) the SQL it’s writing is what you would expect. If you’re using code-first to handle migrating your database by creating tables and columns, make sure the schema generated is what you expect. Issues that arise due to an incorrect data type, or mis-named column can be costly to find and correct!
If I had to give developers working with databases one advice, it would be: make it secure. Every other thing you’ll figure in time – how to structure your tables, how to use ORM, how to optimize queries, how to use indexes, how to do multitenancy. But security may not be on the list of requirements and it may be too late when the need becomes obvious.
So I’d focus on several things:
- Prevent SQL injections – make sure you use an ORM or prepared statements rather than building queries with string concatenation. Otherwise a malicious actor can inject anything in your queries and turn them into a DROP DATABASE query, or worse – one that exfiltrates all the data.
- Support encryption in transit – this often has to be supported by the application’s driver configuration, e.g. by trusting a particular server certificate. Unencrypted communication, even within the same datacenter, is a significant risk and that’s why databases support encryption in transit. (You should also think about encryption at rest, but that’s more of an Ops task)
- Have an audit log at the application level – “who did what” is a very important question from a security and compliance point of view. And no native database functionality can consistently answer the question “who” – it’s the application that manages users. So build an audit trail layer that records who did what changes to what entities/tables.
- Consider record-level encryption for sensitive data – a database can be dumped in full by those who have access (or gain access maliciously). This is how data breaches happen. Sensitive data (like health data, payment data, or even API keys, secrets or tokens) benefits from being encrypted with an application-managed key, so that access to the database alone doesn’t reveal that data. Another option, often used for credit cards, is tokenization, which shifts the encryption responsibility to the tokenization providers. Managing the keys is hard, but even a basic approach is better than nothing.
Security is often viewed as an “operations” responsibility, and this has lead to a lot of tools that try to solve the above problem without touching the application – web application firewalls, heuristics for database access monitoring, trying to extract the current user, etc. But the application is the right place for many of these protections (although certainly not the only place), and as developers we need to be aware of the risks and best practices.
When building and designing databases, sometimes it’s best to just start off with a flat file system until you know exactly what you’re building. In 2020, MongoDB is a joy to work with, easy to get started and NoSQL is the best choice for big data.
In a word, don’t require a database for local development.
Dave also provided a link to this tweet, which states “You shouldn’t need a database to run unit tests.” There are some helpful discussions and replies on it.
Resist the temptation to treat databases as some kind of seamless extension of your code. They aren’t. Databases are different in just about every way imaginable, and trying to hide them behind abstractions just adds an extra layer of complexity: ultimately you still need to access the database itself to get stuff done.
The corollary is simple: you have to learn your database in the same way you have to learn your programming language. Only then will it stop surprising you, and only then can you use it properly.
Learn as much as you can about implementing the database system itself, not just implementing things that use it. The more you know about what is happening under the hood, the more effectively you can drive and maintain the car.
Always query with NOLOCK option.
This can cause a production issue if some job is also running and trying to update the same table you are querying. By saying NOLOCK you reduce the risk of blocking and deadlock
Read more on Javin’s post here.
When doing a statement like:
update table set field = "value"
This will replace the field in every row, even if the field already contains that value! Instead, you should do this:
update table set field = "value" where field <> "value"
That way only the rows where the value in that field is not already identical will be touched, rather than every row in the entire table, which can be very painful on big tables.
I give 2 advice:
- KISS – keep it simple, don’t overengineer your DB and don’t normalize to the death. Redundancy is ok sometimes.
- Choose a database you are familiar with and never ever pick a fancy new one for a new production project. It’ll end in a mess and you’ll do too much wrong.
And as a bonus. Don’t follow fads, stick to the time-tested ones like Postgres.
For *developers* I would recommend learning SQL and not relying only on abstractions (like ORMs).
If I had to offer one piece of advice for developers, I’d say to focus on persistence later. I’d first recommend building the core of your application and making sure your acceptance tests pass with an in-memory database.
When you’re sure your use cases work and the tests pass, then I’d hook it up to persistence and ensure it works with integration tests. This approach keeps your tests fast and helps you delay making a decision as to what database(s) to use.
This might be obvious, but I’d focus on the user experience impact of database decisions.
Remember to consider the end user experience when making database decisions. Databases aren’t just an internal technical consideration, they usually have a direct impact on the end user experience. When you’re selecting for either fast reads or fast writes, consider the UX in each case.
For example, posting a comment doesn’t happen as often as people reading comments, so it’s probably better to optimise for fast reads (tricks like optimistic UIs can help it still feel like it’s writing quickly).
In a low-latency telemetry system with lots of data coming in, you might find you want the writes to finish as quickly as possible, and maybe it’s ok if it takes a while to read it back when using that data.
Each database has its strengths and weaknesses. There’s no one database to rule them all. Make sure you choose the right DB for your purposes and don’t rule out using several databases for your application.
There is no better skill to learn for your engineering career than SQL. Get to know join types, subqueries, indexes, stored procedures, et al and you’ll find yourself using them all the time. SQL is deep enough that there is almost always something more to learn even after years of professional experience.
Don’t use SQLite for a production website.
I would advise them to always think about security, backup, performance and improve the database architecture if needed.
One word: KYD (alias for Know Your Database).
15 years ago, it was pretty obvious. Nearly all widespread databases were SQL. To be a bit blunt, the only key differentiator was the vendor you paid support to.
Nowadays, this has changed significantly. In the realm of NoSQL databases, there are many choices: free-text search databases, document databases, column-oriented databases, graph databases, key-value databases, etc. Even not all SQL databases are equal: some have been proven to be very consistent, some offer GIS capabilities, some are Open Source, some easily allow leader-follower replication, etc.
You definitely need to think about your requirements, and use the relevant database(s), or it’s going to be a mess down the road.
Here’s some advice for individual developers.
There’s a perfectly good language for working with SQL. It’s called SQL! :-) Don’t be afraid to embrace it, especially when it comes to more complex queries or migrations. Don’t rely on the ORM.
Developers often jump through hoops to abstract their code from the database in case they ever switch databases. In 10 years of consulting, we never switched databases – as a rule, it’s generally a waste of time.
For my own company, we actually did switch databases – twice! And you know, it wasn’t that bad. And in both cases, the databases were so different (justifying the switch) that an abstraction never helped anyway.
And for teams:
You’ve all done this – you’re in the middle of a production deployment, and there’s a SQL change that needs to be made – some kind of migration script. You email it to your DBA, who opens it in Management Studio, and proceeds to read the script, executing it one statement at a time, stopping occasionally to shake their head as they read your SQL.
In this moment, two things are happening:
- the DBA – the expert on the database – is reviewing your code. This is necessary! The DBA is a font of knowledge and wisdom and their review can add enormous value.
- the DBA is executing the code – necessary because only the DBA has production database access.
But, these two things are happening at absolutely the wrong time. It’s the middle of a production deployment! There’s a lot of pressure, and no time to make large changes to the script if the DBA sees a problem anyway.
Teams should try hard to decouple these two steps. DBA’s should be involved in SQL reviews way earlier – say in deployments to test environments, or as reviewers on a pull request. They should see the exact same migration script run in every dev and test deployment. There’s plenty of time to review the SQL, make suggestions, test the changes, and address concerns.
When it’s production deployment time, the DBA shouldn’t need to be there. You should have executed the exact same script a dozen times in all of your dev and test deployments prior to the production deployment. The script will not have changed. It should be safe for any member of the team to click “deploy” with confidence that that script – and ONLY that script – will run against production.
In doing this, DBA will be able to spend their time where they add the most value, without the high pressure environment of a production deployment.
No matter what, stick with MySQL!
Reinder de Vries
If you’re new to databases, I recommend you start with learning good old plain SQL. This puts you close to the core principles of most database platforms and tools, such as queries, filtering with WHERE, and joins. A lot of relational database platforms are a riff on “select title from books where author = ‘J.K. Rowling'”. You miss out on that simplicity if you go straight to working with objects or database abstraction layers. Get your bearings with SQL first, and then use that as a stepping stone for more advanced tools and platforms.
Trust reality/production, not your beliefs. All the decisions you make must be based on the information you got from your production server, not from what you think would be helpful:
- Query optimization: most RDBMS do a lot of optimization in the background.
- Index creation: maybe the index you thought about at dev time is not used at all because the table is too small or RDBMS decided to use another index.
- Take the execution plan from the production DB, not from your local env, it’s useless.
- Use an ORM everywhere but hot path: the RDBMS data should show you where it’s a problem.
Decouple your behavioral model from your data model. If at all possible design your business objects first, based upon their behavior. Design the database schema later and keep it decoupled from the business objects.
Develop with performance in mind. Clustering indexes, query performance. I have seen so many apps being developed by programmers that performance is not fully tested and thought out. And get burned at the last minute with many performance issues.
Take time to go back to the basics. So often, we get wrapped up in learning new things and getting more and more complex that we forget the fundamentals. Regularly reviewing the basics is a really good way to stay grounded.
Minimise data collection – you cannot lose what you do not have. Stop collecting data such as date of birth when there’s very rarely a valid reason to do so.
Read the manual. You’ll be amazed how much you miss if you didn’t read it.
You’ll probably agree that there are a lot of great tips in here!
What are your thoughts on these tips? Let me know in the comments below.
Also, if you have a personal blog that focuses on programming, let me know if you’d like to contribute!