FB

When you’re learning SQL or database design, it’s helpful to use other databases as a reference.

Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.

Which is where this post comes in.

This post describes a sample database containing data about video game sales.

It includes:

  • An ERD (entity relationship diagram) for the sample video game database
  • An explanation of the tables and columns
  • A download of sample data to create and populate this database
  • An example query on the database

Why is this helpful? Firstly, you can understand more about how a video game sales database might work.

Also, you can practice SQL against realistic data and write your own queries, both simple (how many platforms was The Sims released on?) and complex (which 2010-released game had the most sales in Europe?).

So, let’s take a look at the database.

 

Sample Video Games Database: ERD

The ERD for the sample video games database is shown here:

Sample Data - Video Games

This database stores information about the top 11,000 or so video games that have had the most sales of all time. It captures video game sales in North America, Europe, Japan, and other regions grouped as “Other”. It has games from various platforms: PC, recent consoles, and old consoles.

The sample data was obtained by a free online data source. There are about 11,000 games, 31 platforms, 577 publishers, and 4 regions in this dataset.

What do all of these tables and columns mean?

 

Table Explanations

Genre

This table has a short list of game genres, such as Strategy or Racing. It’s helpful if you want to see what genre a game is, or the types of genres that are making sales.

Game

A list of all of the games in the database. It only contains a title, and a genre_id to link to the genre table. There is no concept of a series in this data (e.g. “Call of Duty” or “Need for Speed”).

Publisher

A company that publishes video games.

Game_Publisher

This is a joining table that defines the games and their publishers, because in the source data, a game can have many publishers. One example is different publishers work with the game on different platforms (e.g. one on PC and one on XBOX).

Platform

A list of the different gaming platforms in the source data. For example, PC, Playstation 3, XBOX360, Nintendo.

Game_Platform

This table links the games, publishers, and the platform. Using this table, you can see the games, who published them, and on which platform (when you join to the related tables).

It also includes the release_year. Why is the release_year here and not on the game table?

This is because, in the source data, a game can be released by different publishers on a different platform in a different year. For example, SimCity 2000 was released on PC in 1992 and PS in 1996.

Region

This lists 4 different regions that sales were captured in: North America, Europe, Japan, and Others.

Region_Sales

This table shows all sales made for games in each region. It’s the main table used to calculate the sales for queries.

The num_sales column is the number of sales of the game in the region, and the number is in millions.

 

Design Comments

There were a few considerations and limitations in the source data that resulted in this database design:

  • The release year applies to a combination of game, platform, and publisher, not just for a game as I assumed.
  • The sales are captured overall as at a point in time and are not broken down by years.
  • A game can have different publishers for different platforms.
  • The data includes games released in 2016 and a couple of games released in 2017.

 

Sample Data

I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.

This file is set up to run on MySQL.

The file is available here:

sample_data_videogames.sql (2 MB)

It’s a single SQL file that creates the video_games database, creates the tables, and populates them.

You can run the SQL file itself, or use MySQL Workbench’s Data Import feature to import it.

If you’d like to run this on Oracle or SQL Server, enter your email address below and I’ll send the links to the files.

Download the Sample Data for this Database for

Oracle or SQL Server

 

Sample Query

With the sample data in the database, let’s take a look at a query on the video game data.

This query will show the top-selling video games, by platform, and some information about them.

SELECT
g.game_name,
pl.platform_name,
gp.release_year,
pub.publisher_name,
SUM(rs.num_sales) AS global_sales
FROM region_sales rs
INNER JOIN region r ON rs.region_id = r.id
INNER JOIN game_platform gp ON rs.game_platform_id = gp.id
INNER JOIN game_publisher gpub ON gp.game_publisher_id = gpub.id
INNER JOIN game g ON gpub.game_id = g.id
INNER JOIN platform pl ON gp.platform_id = pl.id
INNER JOIN publisher pub ON gpub.publisher_id = pub.id
GROUP BY g.game_name, pl.platform_name, gp.release_year, pub.publisher_name
ORDER BY SUM(rs.num_sales) DESC;

Results (top 20 only):

game_nameplatform_namerelease_yearpublisher_nameglobal_sales
Wii SportsWii2006Nintendo82.74
Super Mario Bros.NES1985Nintendo40.24
Mario Kart WiiWii2008Nintendo35.83
Wii Sports ResortWii2009Nintendo33
Pokemon Red/Pokemon BlueGB1996Nintendo31.38
TetrisGB1989Nintendo30.26
New Super Mario Bros.DS2006Nintendo30.01
Wii PlayWii2006Nintendo29.01
New Super Mario Bros. WiiWii2009Nintendo28.61
Duck HuntNES1984Nintendo28.31
NintendogsDS2005Nintendo24.75
Mario Kart DSDS2005Nintendo23.43
Pokemon Gold/Pokemon SilverGB1999Nintendo23.09
Wii FitWii2007Nintendo22.72
Wii Fit PlusWii2009Nintendo22
Kinect Adventures!X3602010Microsoft Game Studios21.82
Grand Theft Auto VPS32013Take-Two Interactive21.39
Grand Theft Auto: San AndreasPS22004Take-Two Interactive20.81
Super Mario WorldSNES1990Nintendo20.62
Brain Age: Train Your Brain in Minutes a DayDS2005Nintendo20.22

 

Conclusion

So that’s the sample database for video game sales. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.

Download the Sample Data for Oracle or SQL Server