Sample Database: Video Games (ERD and SQL)

When you're learning SQL or database design, it's helpful to use other databases as a reference.

Many articles online refer to Oracle's HR database, or SQL Server's AdventureWorks database. These can be helpful, but often you're looking for answers that these databases don't help with.

Which is where this post comes in.

This post describes a sample database containing data about video game sales.

It includes:

  • An ERD (entity relationship diagram) for the sample video game database
  • An explanation of the tables and columns
  • A download of sample data to create and populate this database
  • An example query on the database

Why is this helpful? Firstly, you can understand more about how a video game sales database might work.

Also, you can practice SQL against realistic data and write your own queries, both simple (how many platforms was The Sims released on?) and complex (which 2010-released game had the most sales in Europe?).

So, let's take a look at the database.

Sample Video Games Database: ERD

The ERD for the sample video games database is shown here:

Sample Data - Video Games

This database stores information about the top 11,000 or so video games that have had the most sales of all time. It captures video game sales in North America, Europe, Japan, and other regions grouped as "Other". It has games from various platforms: PC, recent consoles, and old consoles.

The sample data was obtained by a free online data source. There are about 11,000 games, 31 platforms, 577 publishers, and 4 regions in this dataset.

What do all of these tables and columns mean?

 

Table Explanations

Genre

This table has a short list of game genres, such as Strategy or Racing. It's helpful if you want to see what genre a game is, or the types of genres that are making sales.

Game

A list of all of the games in the database. It only contains a title, and a genre_id to link to the genre table. There is no concept of a series in this data (e.g. "Call of Duty" or "Need for Speed").

Publisher

A company that publishes video games.

Game_Publisher

This is a joining table that defines the games and their publishers, because in the source data, a game can have many publishers. One example is different publishers work with the game on different platforms (e.g. one on PC and one on XBOX).

Platform

A list of the different gaming platforms in the source data. For example, PC, Playstation 3, XBOX360, Nintendo.

Game_Platform

This table links the games, publishers, and the platform. Using this table, you can see the games, who published them, and on which platform (when you join to the related tables).

It also includes the release_year. Why is the release_year here and not on the game table?

This is because, in the source data, a game can be released by different publishers on a different platform in a different year. For example, SimCity 2000 was released on PC in 1992 and PS in 1996.

Region

This lists 4 different regions that sales were captured in: North America, Europe, Japan, and Others.

Region_Sales

This table shows all sales made for games in each region. It's the main table used to calculate the sales for queries.

The num_sales column is the number of sales of the game in the region, and the number is in millions.

 

Design Comments

There were a few considerations and limitations in the source data that resulted in this database design:

  • The release year applies to a combination of game, platform, and publisher, not just for a game as I assumed.
  • The sales are captured overall as at a point in time and are not broken down by years.
  • A game can have different publishers for different platforms.
  • The data includes games released in 2016 and a couple of games released in 2017.

 

Sample Data

I've prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.

The sample data is available for Oracle, SQL Server, MySQL, and Postgres, and is stored on my GitHub repository. Find out how to access it and load the data here: Sample Data for SQL Databases

 

Sample Query

With the sample data in the database, let's take a look at a query on the video game data.

This query will show the top-selling video games, by platform, and some information about them.

 1SELECT
 2g.game_name,
 3pl.platform_name,
 4gp.release_year,
 5pub.publisher_name,
 6SUM(rs.num_sales) AS global_sales
 7FROM region_sales rs
 8INNER JOIN region r ON rs.region_id = r.id
 9INNER JOIN game_platform gp ON rs.game_platform_id = gp.id
10INNER JOIN game_publisher gpub ON gp.game_publisher_id = gpub.id
11INNER JOIN game g ON gpub.game_id = g.id
12INNER JOIN platform pl ON gp.platform_id = pl.id
13INNER JOIN publisher pub ON gpub.publisher_id = pub.id
14GROUP BY g.game_name, pl.platform_name, gp.release_year, pub.publisher_name
15ORDER BY SUM(rs.num_sales) DESC;

Results (top 20 only):

game_name platform_name release_year publisher_name global_sales
Wii Sports Wii 2006 Nintendo 82.74
Super Mario Bros. NES 1985 Nintendo 40.24
Mario Kart Wii Wii 2008 Nintendo 35.83
Wii Sports Resort Wii 2009 Nintendo 33
Pokemon Red/Pokemon Blue GB 1996 Nintendo 31.38
Tetris GB 1989 Nintendo 30.26
New Super Mario Bros. DS 2006 Nintendo 30.01
Wii Play Wii 2006 Nintendo 29.01
New Super Mario Bros. Wii Wii 2009 Nintendo 28.61
Duck Hunt NES 1984 Nintendo 28.31
Nintendogs DS 2005 Nintendo 24.75
Mario Kart DS DS 2005 Nintendo 23.43
Pokemon Gold/Pokemon Silver GB 1999 Nintendo 23.09
Wii Fit Wii 2007 Nintendo 22.72
Wii Fit Plus Wii 2009 Nintendo 22
Kinect Adventures! X360 2010 Microsoft Game Studios 21.82
Grand Theft Auto V PS3 2013 Take-Two Interactive 21.39
Grand Theft Auto: San Andreas PS2 2004 Take-Two Interactive 20.81
Super Mario World SNES 1990 Nintendo 20.62
Brain Age: Train Your Brain in Minutes a Day DS 2005 Nintendo 20.22

 

Conclusion

So that's the sample database for video game sales. There's an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.

comments powered by Disqus