When you’re learning SQL or database design, it’s helpful to use other databases as a reference.
Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.
Which is where this post comes in.
This post describes a sample database containing data about video game sales.
It includes:
- An ERD (entity relationship diagram) for the sample video game database
- An explanation of the tables and columns
- A download of sample data to create and populate this database
- An example query on the database
Why is this helpful? Firstly, you can understand more about how a video game sales database might work.
Also, you can practice SQL against realistic data and write your own queries, both simple (how many platforms was The Sims released on?) and complex (which 2010-released game had the most sales in Europe?).
So, let’s take a look at the database.
Sample Video Games Database: ERD
The ERD for the sample video games database is shown here:
This database stores information about the top 11,000 or so video games that have had the most sales of all time. It captures video game sales in North America, Europe, Japan, and other regions grouped as “Other”. It has games from various platforms: PC, recent consoles, and old consoles.
The sample data was obtained by a free online data source. There are about 11,000 games, 31 platforms, 577 publishers, and 4 regions in this dataset.
What do all of these tables and columns mean?
Table Explanations
Genre
This table has a short list of game genres, such as Strategy or Racing. It’s helpful if you want to see what genre a game is, or the types of genres that are making sales.
Game
A list of all of the games in the database. It only contains a title, and a genre_id to link to the genre table. There is no concept of a series in this data (e.g. “Call of Duty” or “Need for Speed”).
Publisher
A company that publishes video games.
Game_Publisher
This is a joining table that defines the games and their publishers, because in the source data, a game can have many publishers. One example is different publishers work with the game on different platforms (e.g. one on PC and one on XBOX).
Platform
A list of the different gaming platforms in the source data. For example, PC, Playstation 3, XBOX360, Nintendo.
Game_Platform
This table links the games, publishers, and the platform. Using this table, you can see the games, who published them, and on which platform (when you join to the related tables).
It also includes the release_year. Why is the release_year here and not on the game table?
This is because, in the source data, a game can be released by different publishers on a different platform in a different year. For example, SimCity 2000 was released on PC in 1992 and PS in 1996.
Region
This lists 4 different regions that sales were captured in: North America, Europe, Japan, and Others.
Region_Sales
This table shows all sales made for games in each region. It’s the main table used to calculate the sales for queries.
The num_sales column is the number of sales of the game in the region, and the number is in millions.
Design Comments
There were a few considerations and limitations in the source data that resulted in this database design:
- The release year applies to a combination of game, platform, and publisher, not just for a game as I assumed.
- The sales are captured overall as at a point in time and are not broken down by years.
- A game can have different publishers for different platforms.
- The data includes games released in 2016 and a couple of games released in 2017.
Sample Data
I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.
The sample data is available for Oracle, SQL Server, MySQL, and Postgres, and is stored on my GitHub repository. Find out how to access it and load the data here: Sample Data for SQL Databases
Sample Query
With the sample data in the database, let’s take a look at a query on the video game data.
This query will show the top-selling video games, by platform, and some information about them.
SELECT
g.game_name,
pl.platform_name,
gp.release_year,
pub.publisher_name,
SUM(rs.num_sales) AS global_sales
FROM region_sales rs
INNER JOIN region r ON rs.region_id = r.id
INNER JOIN game_platform gp ON rs.game_platform_id = gp.id
INNER JOIN game_publisher gpub ON gp.game_publisher_id = gpub.id
INNER JOIN game g ON gpub.game_id = g.id
INNER JOIN platform pl ON gp.platform_id = pl.id
INNER JOIN publisher pub ON gpub.publisher_id = pub.id
GROUP BY g.game_name, pl.platform_name, gp.release_year, pub.publisher_name
ORDER BY SUM(rs.num_sales) DESC;
Results (top 20 only):
game_name | platform_name | release_year | publisher_name | global_sales |
Wii Sports | Wii | 2006 | Nintendo | 82.74 |
Super Mario Bros. | NES | 1985 | Nintendo | 40.24 |
Mario Kart Wii | Wii | 2008 | Nintendo | 35.83 |
Wii Sports Resort | Wii | 2009 | Nintendo | 33 |
Pokemon Red/Pokemon Blue | GB | 1996 | Nintendo | 31.38 |
Tetris | GB | 1989 | Nintendo | 30.26 |
New Super Mario Bros. | DS | 2006 | Nintendo | 30.01 |
Wii Play | Wii | 2006 | Nintendo | 29.01 |
New Super Mario Bros. Wii | Wii | 2009 | Nintendo | 28.61 |
Duck Hunt | NES | 1984 | Nintendo | 28.31 |
Nintendogs | DS | 2005 | Nintendo | 24.75 |
Mario Kart DS | DS | 2005 | Nintendo | 23.43 |
Pokemon Gold/Pokemon Silver | GB | 1999 | Nintendo | 23.09 |
Wii Fit | Wii | 2007 | Nintendo | 22.72 |
Wii Fit Plus | Wii | 2009 | Nintendo | 22 |
Kinect Adventures! | X360 | 2010 | Microsoft Game Studios | 21.82 |
Grand Theft Auto V | PS3 | 2013 | Take-Two Interactive | 21.39 |
Grand Theft Auto: San Andreas | PS2 | 2004 | Take-Two Interactive | 20.81 |
Super Mario World | SNES | 1990 | Nintendo | 20.62 |
Brain Age: Train Your Brain in Minutes a Day | DS | 2005 | Nintendo | 20.22 |
Conclusion
So that’s the sample database for video game sales. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.
Hi, I’m practicing oracle sql now. Can you provide an oracle sql for this sample db?
Hope for response thanks
Hi Forte, you should be able to enter your email address in the form on the page and you’ll get sent the scripts and instructions for Oracle SQL.
Hi, I’ve entered my email and got the download link but I takes me to a blank converkit page and the download never starts
Hi Damian, I’ve emailed you with a link to the files. I think there was some issue with my email provider that showed this page.
Hey, I’m learning how to use access and was wondering if you could provide the access sample?
Hi, I don’t have a sample of this for Microsoft Access, but you may be able to use the SQL Server or MySQL versions and make some adjustments for it to work in Access.
Someone who has created a gamerside with this sql database. ??
If you have a brief description of the game as well that would be great. I’m building a website using a game database. I want a brief description of the game, do you have any suggestions?
Can you do a M:N relationship with this and which one could you do it with
You could. The region_sales table is a representation of the many to many relationship between region and game_platform. This is because a region has sales in many platforms, and a platform has sales in many regions.
Can you provide query for this database like sample query