Software developers are one of the main groups of people who work with databases.
Recently, I asked a range of software developer blog owners to share their advice for working with databases.
I asked them all a single question:
If you could offer one piece of advice to developers on working with databases, what would it be?
Here are their answers.
I suggested they could provide as much or as little as they like, which is why some answers are shorter than others. All of the responses contain helpful advice for developers working with databases.
Adam Bien
Website: adam-bien.com
Decide early what is more important to you: the structure of the database or your code’s design. Otherwise, you will waste a lot of time, and one of them will suffer anyway.
Ahmad Awais
Website: ahmadawais.com
Make sure you use an excellent layer of rate-limiting with any type of caching for databases that allow public access. I can’t overstate how many times only having a good rate-limiting security has saved me from thousands of dollars of AWS bills.
Alexey Golub
Website: tyrrrz.me/blog
I wouldn’t really consider myself a database expert. But I would say the biggest advice I could give to someone is to learn about specifics of the engine they’re using and recognize its strengths and weaknesses. For example, just because it’s an SQL database, doesn’t mean it’s going to work the same as every other SQL database out there. Some people consider knowing how their database works an implementation detail and that it’s best to hide it behind a thick layer of abstraction so you can swap it out when you need, but I disagree. A database is not an implementation detail, but a very important and potentially defining aspect of your application.
Andrew Lock
Website: andrewlock.net
Always make sure you understand what your code is actually doing when it interacts with the database. If you’re using an ORM, make sure you understand (and check) the SQL it’s writing is what you would expect. If you’re using code-first to handle migrating your database by creating tables and columns, make sure the schema generated is what you expect. Issues that arise due to an incorrect data type, or mis-named column can be costly to find and correct!
Bozhidar Bozhanov
Website: techblog.bozho.net
If I had to give developers working with databases one advice, it would be: make it secure. Every other thing you’ll figure in time – how to structure your tables, how to use ORM, how to optimize queries, how to use indexes, how to do multitenancy. But security may not be on the list of requirements and it may be too late when the need becomes obvious.
So I’d focus on several things:
- Prevent SQL injections – make sure you use an ORM or prepared statements rather than building queries with string concatenation. Otherwise a malicious actor can inject anything in your queries and turn them into a DROP DATABASE query, or worse – one that exfiltrates all the data.
- Support encryption in transit – this often has to be supported by the application’s driver configuration, e.g. by trusting a particular server certificate. Unencrypted communication, even within the same datacenter, is a significant risk and that’s why databases support encryption in transit. (You should also think about encryption at rest, but that’s more of an Ops task)
- Have an audit log at the application level – “who did what” is a very important question from a security and compliance point of view. And no native database functionality can consistently answer the question “who” – it’s the application that manages users. So build an audit trail layer that records who did what changes to what entities/tables.
- Consider record-level encryption for sensitive data – a database can be dumped in full by those who have access (or gain access maliciously). This is how data breaches happen. Sensitive data (like health data, payment data, or even API keys, secrets or tokens) benefits from being encrypted with an application-managed key, so that access to the database alone doesn’t reveal that data. Another option, often used for credit cards, is tokenization, which shifts the encryption responsibility to the tokenization providers. Managing the keys is hard, but even a basic approach is better than nothing.
Security is often viewed as an “operations” responsibility, and this has lead to a lot of tools that try to solve the above problem without touching the application – web application firewalls, heuristics for database access monitoring, trying to extract the current user, etc. But the application is the right place for many of these protections (although certainly not the only place), and as developers we need to be aware of the risks and best practices.
Chris Hawkes
Website: hipstercode.com
When building and designing databases, sometimes it’s best to just start off with a flat file system until you know exactly what you’re building. In 2020, MongoDB is a joy to work with, easy to get started and NoSQL is the best choice for big data.
Dave Cheney
Website: dave.cheney.net
In a word, don’t require a database for local development.
Dave also provided a link to this tweet, which states “You shouldn’t need a database to run unit tests.” There are some helpful discussions and replies on it.
https://twitter.com/davecheney/status/1253201549779587073
David Thomas
Website: pragdave.me
Resist the temptation to treat databases as some kind of seamless extension of your code. They aren’t. Databases are different in just about every way imaginable, and trying to hide them behind abstractions just adds an extra layer of complexity: ultimately you still need to access the database itself to get stuff done.
The corollary is simple: you have to learn your database in the same way you have to learn your programming language. Only then will it stop surprising you, and only then can you use it properly.
Eric Sink
Website: ericsink.com
Learn as much as you can about implementing the database system itself, not just implementing things that use it. The more you know about what is happening under the hood, the more effectively you can drive and maintain the car.
Javin Paul
Website: javarevisited.blogspot.com
Always query with NOLOCK option.
This can cause a production issue if some job is also running and trying to update the same table you are querying. By saying NOLOCK you reduce the risk of blocking and deadlock
Read more on Javin’s post here.
Jeff Atwood
Website: blog.codinghorror.com
When doing a statement like:
update table set field = "value"
This will replace the field in every row, even if the field already contains that value! Instead, you should do this:
update table set field = "value" where field <> "value"
That way only the rows where the value in that field is not already identical will be touched, rather than every row in the entire table, which can be very painful on big tables.
Jens Boje
Website: codeboje.de
I give 2 advice:
- KISS – keep it simple, don’t overengineer your DB and don’t normalize to the death. Redundancy is ok sometimes.
- Choose a database you are familiar with and never ever pick a fancy new one for a new production project. It’ll end in a mess and you’ll do too much wrong.
And as a bonus. Don’t follow fads, stick to the time-tested ones like Postgres.
Jiří Činčura
Website: tabsoverspaces.com
For *developers* I would recommend learning SQL and not relying only on abstractions (like ORMs).
Khalil Stemmler
Website: khalilstemmler.com
If I had to offer one piece of advice for developers, I’d say to focus on persistence later. I’d first recommend building the core of your application and making sure your acceptance tests pass with an in-memory database.
When you’re sure your use cases work and the tests pass, then I’d hook it up to persistence and ensure it works with integration tests. This approach keeps your tests fast and helps you delay making a decision as to what database(s) to use.
Mat Ryer
Website: pace.dev
This might be obvious, but I’d focus on the user experience impact of database decisions.
Remember to consider the end user experience when making database decisions. Databases aren’t just an internal technical consideration, they usually have a direct impact on the end user experience. When you’re selecting for either fast reads or fast writes, consider the UX in each case.
For example, posting a comment doesn’t happen as often as people reading comments, so it’s probably better to optimise for fast reads (tricks like optimistic UIs can help it still feel like it’s writing quickly).
In a low-latency telemetry system with lots of data coming in, you might find you want the writes to finish as quickly as possible, and maybe it’s ok if it takes a while to read it back when using that data.
Michael Shpilt
Website: michaelscodingspot.com
Each database has its strengths and weaknesses. There’s no one database to rule them all. Make sure you choose the right DB for your purposes and don’t rule out using several databases for your application.
Mike Perham
Website: mikeperham.com
There is no better skill to learn for your engineering career than SQL. Get to know join types, subqueries, indexes, stored procedures, et al and you’ll find yourself using them all the time. SQL is deep enough that there is almost always something more to learn even after years of professional experience.
Mitch Tabian
Website: codingwithmitch.com
Don’t use SQLite for a production website.
Neel Bhatt
Website: neelbhatt.com
I would advise them to always think about security, backup, performance and improve the database architecture if needed.
Nicolas Fränkel
Website: blog.frankel.ch
One word: KYD (alias for Know Your Database).
15 years ago, it was pretty obvious. Nearly all widespread databases were SQL. To be a bit blunt, the only key differentiator was the vendor you paid support to.
Nowadays, this has changed significantly. In the realm of NoSQL databases, there are many choices: free-text search databases, document databases, column-oriented databases, graph databases, key-value databases, etc. Even not all SQL databases are equal: some have been proven to be very consistent, some offer GIS capabilities, some are Open Source, some easily allow leader-follower replication, etc.
You definitely need to think about your requirements, and use the relevant database(s), or it’s going to be a mess down the road.
Paul Stovell
Website: paulstovell.com
Here’s some advice for individual developers.
There’s a perfectly good language for working with SQL. It’s called SQL! :-) Don’t be afraid to embrace it, especially when it comes to more complex queries or migrations. Don’t rely on the ORM.
And another:
Developers often jump through hoops to abstract their code from the database in case they ever switch databases. In 10 years of consulting, we never switched databases – as a rule, it’s generally a waste of time.
For my own company, we actually did switch databases – twice! And you know, it wasn’t that bad. And in both cases, the databases were so different (justifying the switch) that an abstraction never helped anyway.
And for teams:
You’ve all done this – you’re in the middle of a production deployment, and there’s a SQL change that needs to be made – some kind of migration script. You email it to your DBA, who opens it in Management Studio, and proceeds to read the script, executing it one statement at a time, stopping occasionally to shake their head as they read your SQL.
In this moment, two things are happening:
- the DBA – the expert on the database – is reviewing your code. This is necessary! The DBA is a font of knowledge and wisdom and their review can add enormous value.
- the DBA is executing the code – necessary because only the DBA has production database access.
But, these two things are happening at absolutely the wrong time. It’s the middle of a production deployment! There’s a lot of pressure, and no time to make large changes to the script if the DBA sees a problem anyway.
Teams should try hard to decouple these two steps. DBA’s should be involved in SQL reviews way earlier – say in deployments to test environments, or as reviewers on a pull request. They should see the exact same migration script run in every dev and test deployment. There’s plenty of time to review the SQL, make suggestions, test the changes, and address concerns.
When it’s production deployment time, the DBA shouldn’t need to be there. You should have executed the exact same script a dozen times in all of your dev and test deployments prior to the production deployment. The script will not have changed. It should be safe for any member of the team to click “deploy” with confidence that that script – and ONLY that script – will run against production.
In doing this, DBA will be able to spend their time where they add the most value, without the high pressure environment of a production deployment.
Peter Krumins
Website: catonmat.net
No matter what, stick with MySQL!
Reinder de Vries
Website: learnappmaking.com
If you’re new to databases, I recommend you start with learning good old plain SQL. This puts you close to the core principles of most database platforms and tools, such as queries, filtering with WHERE, and joins. A lot of relational database platforms are a riff on “select title from books where author = ‘J.K. Rowling'”. You miss out on that simplicity if you go straight to working with objects or database abstraction layers. Get your bearings with SQL first, and then use that as a stepping stone for more advanced tools and platforms.
Rémi Bourgarel
Website: remibou.github.io
Trust reality/production, not your beliefs. All the decisions you make must be based on the information you got from your production server, not from what you think would be helpful:
- Query optimization: most RDBMS do a lot of optimization in the background.
- Index creation: maybe the index you thought about at dev time is not used at all because the table is too small or RDBMS decided to use another index.
- Take the execution plan from the production DB, not from your local env, it’s useless.
- Use an ORM everywhere but hot path: the RDBMS data should show you where it’s a problem.
Robert Martin
Website: cleancoder.com
Decouple your behavioral model from your data model. If at all possible design your business objects first, based upon their behavior. Design the database schema later and keep it decoupled from the business objects.
Roy Kim
Website: roykim.ca
Develop with performance in mind. Clustering indexes, query performance. I have seen so many apps being developed by programmers that performance is not fully tested and thought out. And get burned at the last minute with many performance issues.
Sam Gavis-Hughson
Website: byte-by-byte.com
Take time to go back to the basics. So often, we get wrapped up in learning new things and getting more and more complex that we forget the fundamentals. Regularly reviewing the basics is a really good way to stay grounded.
Troy Hunt
Website: troyhunt.com
Minimise data collection – you cannot lose what you do not have. Stop collecting data such as date of birth when there’s very rarely a valid reason to do so.
Vlad Mihalcea
Website: vladmihalcea.com
Read the manual. You’ll be amazed how much you miss if you didn’t read it.
Conclusion
You’ll probably agree that there are a lot of great tips in here!
What are your thoughts on these tips? Let me know in the comments below.
Also, if you have a personal blog that focuses on programming, let me know if you’d like to contribute!
DMA would laugh and cry
Globally using NOLOCK is a horrible idea and leads to concurrency issues. If you have trouble with blocking use RCSI. There are times to use it when debugging but never use it in production where the data has to be right. NOLOCK = “maybe my data is right”.
Thanks for providing more info! I’ve never used NOLOCK when querying in SQL Server but good to understand why it’s not a good idea.
You should almost NEVER use NOLOCK.
Take into account data volume scalability in your code. For example, avoid looping in your program. It may still give acceptable performance for small data volume but will suffer when there is a huge data growth, and you are not maximizing the compute power of the database. A slight drop in latency in the network or database will have multiplier effect on the performance. Batch processing will be the solution for this example.
This short article really is a concentration of great advice.
Not all of it is golden, e.g. “No matter what, stick with MySQL!” and “Always query with NOLOCK option” are bad advice at best, dangerously myopic and insulated at worst.
In general, “learn actual SQL”, “Learn the pros and cons before selecting a database”, and “separate your business objects from your specific database (not necessarily using an ORM)” are covered here should be the golden rules for developers making database choices.
I would also add, “understand the basics of how indexes impact performance”, the details of which surprise many developers and can differ greatly even between seemingly similar systems like Microsoft SQL vs. Postgres.
The first one is a gem. Especially in code-intensive DBs with much business logic inside. For instance, chained views may be bad for performance and perf tuning, but they aid code reuse and readability greatly. So does having as little points of data materialization as possible. Knowing techniques that maintain performance while keeping the codebase manageable is so important.
I would advise them to always think about security, backup, performance, and improve the database architecture if needed.
“Don’t use SQLite for a production website.” Is it serious?
Depends on the website or project or the data process you have and requeriements. Too many people use dbms for tiny projects or tiny websites with low traffic where a few simple texts files, even only one, suffice. If you need a db for those projects, use Sqlite.
Thanks for the feedback!