FB

When you’re learning SQL or database design, it’s helpful to use other databases as a reference.

Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.

Which is where this post comes in.

This post describes a sample database containing data about Olympic events and competitors.

It includes:

  • An ERD (entity relationship diagram) for the sample Olympics database
  • An explanation of the tables and columns
  • A download of sample data to create and populate this database
  • An example query on the database

Why is this helpful? Firstly, you can understand more about how an Olympic competitor database might work.

Also, you can practice SQL against realistic data and write your own queries, both simple (who won medals at the 2004 Olympics) and complex (which competitors won a medal in their first Olympic games).

So, let’s take a look at the database.

 

Sample Olympic Database: ERD

The ERD for the Olympic database is shown here (click to open large version):

Olympics ERD

It contains about 260,000 separate results for competitors at all events since 1896, including both summer and winter Olympics. It has all the results: not only the medals but those who competed and did not win a medal. All events for each sport are included.

What do all of these tables and columns mean?

 

Table Explanations

Sport

This table contains a list of all sports in the Olympics: both summer sports and winter sports.

Event

The event table contains all of the different events for each sport. For example, if a sport is “Alpine Skiing”, then an event is “1KM Men’s Alpine Skiing”. There are multiple events per sport, and events are split into Men, Women, and Mixed.

City

This represents a list of many of the cities in the world.

Games

The games table lists all of the Olympic Games since 1896m, the year they were held, and whether they were Summer or Olympic games.

Looking at this data, I learned that the Summer and Winter Olympics used to be in the same year. I’ve only ever known them to be in alternating years.

Games City

This table shows all of the cities for each Olympic games. This is a joining table between games and city.

Why do we need a joining table? Aren’t the Olympics always in one city?

Well, as I was compiling this data, I got some duplicate rows in my results. I traced this back to a particular set of events for one Olympics. It turns out that in 1956, the Summer Olympics was in Melbourne for most of the events, but due to Australia’s strict horse quarantine rules, they couldn’t bring the horses for the equestrian events. So they performed all of the Equestrian events in Stockholm earlier in 1956.

So, the 1956 Olympics was in both Melbourne and Stockholm.

I didn’t realise this was the case until I looked at the data.

It’s a good lesson: we shouldn’t assume we know how data is structured until we look at the details.

NOC Region

This contains a list of NOC (National Olympic Committee) codes and their names. This translates roughly to a country that competes in the Olympics.

Person

This table lists all people that have competed in an Olympic games. It has their name, gender, height (in CM) and weight (in KG). The height and weight did not differ between Olympic games. If it did, it would have been stored in a different table.

Person Region

This is a joining table that lists all people and the NOC regions (countries) they competed for. This captures the fact that some people competed for more than one country, which is another scenario I didn’t realise until I looked at the data.

Games Competitor

This table is a joining table that relates a person to an Olympic games, which shows who competed what each Olympic games.

Medal

This small table lists the different medals available: Gold, Silver, Bronze, and N/A.

Competitor Event

This table lists the combination of competitors (the people and the games they competed in), the event they competed in, and the medal (if any) they received. This is the largest table.

 

Design Comments

There were a few things I had to consider when designing this database, and some were mentioned earlier:

  • It’s possible for an Olympic games to be held in two different cities (e.g. Melbourne and Stockholm in 1956).
  • A person can compete for two different countries in two different Olympics.
  • A person’s height and weight may change between Olympics in reality, but in the data I had, they did not change.

 

Sample Data

I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.

This file is set up to run on MySQL.

The file is available here:

sample_data_olympics.sql (13 MB)

It’s a single SQL file that creates the Olympics database, creates the tables, and populates them.

You can run the SQL file itself, or use MySQL Workbench’s Data Import feature to import it.

If you’d like to run this on Oracle or SQL Server, enter your email address below and I’ll send the links to the files.

 

Sample Query

With the sample data in the database, let’s take a look at a sample query. This query shows the people who have won the most medals of any type (Gold, Silver, Bronze) at the Olympic Games.

SELECT
p.id,
p.full_name,
COUNT(*)
FROM person p
INNER JOIN games_competitor gc ON p.id = gc.person_id
INNER JOIN competitor_event ce ON gc.id = ce.competitor_id
INNER JOIN medal m ON ce.medal_id = m.id
WHERE medal_name IN ('Gold', 'Silver', 'Bronze')
GROUP BY p.id, p.full_name
ORDER BY COUNT(*) DESC;

 

Results (top 20):

id full_name COUNT(*)
94406 Michael Fred Phelps, II 28
67046 Larysa Semenivna Latynina (Diriy-) 18
4198 Nikolay Yefimovich Andrianov 15
89187 Takashi Ono 13
109161 Borys Anfiyanovych Shakhlin 13
11951 Ole Einar Bjrndalen 13
74420 Edoardo Mangiarotti 13
85286 Aleksey Yuryevich Nemov 12
70965 Ryan Steven Lochte 12
87390 Paavo Johannes Nurmi 12
119922 Jennifer Elisabeth “Jenny” Thompson (-Cumpelik) 12
23426 Natalie Anne Coughlin (-Hall) 12
57998 Sawao Kato 12
121258 Dara Grace Torres (-Hoffman, -Minas) 12
35550 Birgit Fischer-Schmidt 12
89706 Carl Townsend Osburn 11
11642 Matthew Nicholas “Matt” Biondi 11
18826 Vra slavsk (-Odloilov) 11
21402 Viktor Ivanovych Chukarin 11
113912 Mark Andrew Spitz 11

 

There are all kinds of other queries you can write on this data.

 

Conclusion

So that’s the sample database for the Olympic Games. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.