When you’re learning SQL or database design, it’s helpful to use other databases as a reference.
Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.
Which is where this post comes in.
This post describes a sample database containing data about video game sales.
- An ERD (entity relationship diagram) for the sample video game database
- An explanation of the tables and columns
- A download of sample data to create and populate this database
- An example query on the database
Why is this helpful? Firstly, you can understand more about how a video game sales database might work.
Also, you can practice SQL against realistic data and write your own queries, both simple (how many platforms was The Sims released on?) and complex (which 2010-released game had the most sales in Europe?).
So, let’s take a look at the database.
Sample Video Games Database: ERD
The ERD for the sample video games database is shown here:
This database stores information about the top 11,000 or so video games that have had the most sales of all time. It captures video game sales in North America, Europe, Japan, and other regions grouped as “Other”. It has games from various platforms: PC, recent consoles, and old consoles.
The sample data was obtained by a free online data source. There are about 11,000 games, 31 platforms, 577 publishers, and 4 regions in this dataset.
What do all of these tables and columns mean?
This table has a short list of game genres, such as Strategy or Racing. It’s helpful if you want to see what genre a game is, or the types of genres that are making sales.
A list of all of the games in the database. It only contains a title, and a genre_id to link to the genre table. There is no concept of a series in this data (e.g. “Call of Duty” or “Need for Speed”).
A company that publishes video games.
This is a joining table that defines the games and their publishers, because in the source data, a game can have many publishers. One example is different publishers work with the game on different platforms (e.g. one on PC and one on XBOX).
A list of the different gaming platforms in the source data. For example, PC, Playstation 3, XBOX360, Nintendo.
This table links the games, publishers, and the platform. Using this table, you can see the games, who published them, and on which platform (when you join to the related tables).
It also includes the release_year. Why is the release_year here and not on the game table?
This is because, in the source data, a game can be released by different publishers on a different platform in a different year. For example, SimCity 2000 was released on PC in 1992 and PS in 1996.
This lists 4 different regions that sales were captured in: North America, Europe, Japan, and Others.
This table shows all sales made for games in each region. It’s the main table used to calculate the sales for queries.
The num_sales column is the number of sales of the game in the region, and the number is in millions.
There were a few considerations and limitations in the source data that resulted in this database design:
- The release year applies to a combination of game, platform, and publisher, not just for a game as I assumed.
- The sales are captured overall as at a point in time and are not broken down by years.
- A game can have different publishers for different platforms.
- The data includes games released in 2016 and a couple of games released in 2017.
I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.
The sample data is available for Oracle, SQL Server, MySQL, and Postgres, and is stored on my GitHub repository. Find out how to access it and load the data here: Sample Data for SQL Databases
With the sample data in the database, let’s take a look at a query on the video game data.
This query will show the top-selling video games, by platform, and some information about them.
SUM(rs.num_sales) AS global_sales
FROM region_sales rs
INNER JOIN region r ON rs.region_id = r.id
INNER JOIN game_platform gp ON rs.game_platform_id = gp.id
INNER JOIN game_publisher gpub ON gp.game_publisher_id = gpub.id
INNER JOIN game g ON gpub.game_id = g.id
INNER JOIN platform pl ON gp.platform_id = pl.id
INNER JOIN publisher pub ON gpub.publisher_id = pub.id
GROUP BY g.game_name, pl.platform_name, gp.release_year, pub.publisher_name
ORDER BY SUM(rs.num_sales) DESC;
Results (top 20 only):
|Super Mario Bros.
|Mario Kart Wii
|Wii Sports Resort
|Pokemon Red/Pokemon Blue
|New Super Mario Bros.
|New Super Mario Bros. Wii
|Mario Kart DS
|Pokemon Gold/Pokemon Silver
|Wii Fit Plus
|Microsoft Game Studios
|Grand Theft Auto V
|Grand Theft Auto: San Andreas
|Super Mario World
|Brain Age: Train Your Brain in Minutes a Day
So that’s the sample database for video game sales. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.