FB

When you’re learning SQL or database design, it’s helpful to use other databases as a reference.

Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.

Which is where this post comes in.

This post describes a sample database containing data about video game sales.

It includes:

  • An ERD (entity relationship diagram) for the sample video game database
  • An explanation of the tables and columns
  • A download of sample data to create and populate this database
  • An example query on the database

Why is this helpful? Firstly, you can understand more about how a video game sales database might work.

Also, you can practice SQL against realistic data and write your own queries, both simple (how many platforms was The Sims released on?) and complex (which 2010-released game had the most sales in Europe?).

So, let’s take a look at the database.

 

Sample Video Games Database: ERD

The ERD for the sample video games database is shown here:

Sample Data - Video Games

This database stores information about the top 11,000 or so video games that have had the most sales of all time. It captures video game sales in North America, Europe, Japan, and other regions grouped as “Other”. It has games from various platforms: PC, recent consoles, and old consoles.

The sample data was obtained by a free online data source. There are about 11,000 games, 31 platforms, 577 publishers, and 4 regions in this dataset.

What do all of these tables and columns mean?

 

Table Explanations

Genre

This table has a short list of game genres, such as Strategy or Racing. It’s helpful if you want to see what genre a game is, or the types of genres that are making sales.

Game

A list of all of the games in the database. It only contains a title, and a genre_id to link to the genre table. There is no concept of a series in this data (e.g. “Call of Duty” or “Need for Speed”).

Publisher

A company that publishes video games.

Game_Publisher

This is a joining table that defines the games and their publishers, because in the source data, a game can have many publishers. One example is different publishers work with the game on different platforms (e.g. one on PC and one on XBOX).

Platform

A list of the different gaming platforms in the source data. For example, PC, Playstation 3, XBOX360, Nintendo.

Game_Platform

This table links the games, publishers, and the platform. Using this table, you can see the games, who published them, and on which platform (when you join to the related tables).

It also includes the release_year. Why is the release_year here and not on the game table?

This is because, in the source data, a game can be released by different publishers on a different platform in a different year. For example, SimCity 2000 was released on PC in 1992 and PS in 1996.

Region

This lists 4 different regions that sales were captured in: North America, Europe, Japan, and Others.

Region_Sales

This table shows all sales made for games in each region. It’s the main table used to calculate the sales for queries.

The num_sales column is the number of sales of the game in the region, and the number is in millions.

 

Design Comments

There were a few considerations and limitations in the source data that resulted in this database design:

  • The release year applies to a combination of game, platform, and publisher, not just for a game as I assumed.
  • The sales are captured overall as at a point in time and are not broken down by years.
  • A game can have different publishers for different platforms.
  • The data includes games released in 2016 and a couple of games released in 2017.

 

Sample Data

I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.

This file is set up to run on MySQL.

The file is available here:

sample_data_videogames.sql (2 MB)

It’s a single SQL file that creates the video_games database, creates the tables, and populates them.

You can run the SQL file itself, or use MySQL Workbench’s Data Import feature to import it.

If you’d like to run this on Oracle or SQL Server, enter your email address below and I’ll send the links to the files.

Download the Sample Data for this Database for

Oracle or SQL Server

 

Sample Query

With the sample data in the database, let’s take a look at a query on the video game data.

This query will show the top-selling video games, by platform, and some information about them.

SELECT
g.game_name,
pl.platform_name,
gp.release_year,
pub.publisher_name,
SUM(rs.num_sales) AS global_sales
FROM region_sales rs
INNER JOIN region r ON rs.region_id = r.id
INNER JOIN game_platform gp ON rs.game_platform_id = gp.id
INNER JOIN game_publisher gpub ON gp.game_publisher_id = gpub.id
INNER JOIN game g ON gpub.game_id = g.id
INNER JOIN platform pl ON gp.platform_id = pl.id
INNER JOIN publisher pub ON gpub.publisher_id = pub.id
GROUP BY g.game_name, pl.platform_name, gp.release_year, pub.publisher_name
ORDER BY SUM(rs.num_sales) DESC;

Results (top 20 only):

game_name platform_name release_year publisher_name global_sales
Wii Sports Wii 2006 Nintendo 82.74
Super Mario Bros. NES 1985 Nintendo 40.24
Mario Kart Wii Wii 2008 Nintendo 35.83
Wii Sports Resort Wii 2009 Nintendo 33
Pokemon Red/Pokemon Blue GB 1996 Nintendo 31.38
Tetris GB 1989 Nintendo 30.26
New Super Mario Bros. DS 2006 Nintendo 30.01
Wii Play Wii 2006 Nintendo 29.01
New Super Mario Bros. Wii Wii 2009 Nintendo 28.61
Duck Hunt NES 1984 Nintendo 28.31
Nintendogs DS 2005 Nintendo 24.75
Mario Kart DS DS 2005 Nintendo 23.43
Pokemon Gold/Pokemon Silver GB 1999 Nintendo 23.09
Wii Fit Wii 2007 Nintendo 22.72
Wii Fit Plus Wii 2009 Nintendo 22
Kinect Adventures! X360 2010 Microsoft Game Studios 21.82
Grand Theft Auto V PS3 2013 Take-Two Interactive 21.39
Grand Theft Auto: San Andreas PS2 2004 Take-Two Interactive 20.81
Super Mario World SNES 1990 Nintendo 20.62
Brain Age: Train Your Brain in Minutes a Day DS 2005 Nintendo 20.22

 

Conclusion

So that’s the sample database for video game sales. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.

Download the Sample Data for Oracle or SQL Server