When you’re learning SQL or database design, it’s helpful to use other databases as a reference.
Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.
Which is where this post comes in.
This post describes a sample database containing data about Olympic events and competitors.
- An ERD (entity relationship diagram) for the sample Olympics database
- An explanation of the tables and columns
- A download of sample data to create and populate this database
- An example query on the database
Why is this helpful? Firstly, you can understand more about how an Olympic competitor database might work.
Also, you can practice SQL against realistic data and write your own queries, both simple (who won medals at the 2004 Olympics) and complex (which competitors won a medal in their first Olympic games).
So, let’s take a look at the database.
Sample Olympic Database: ERD
The ERD for the Olympic database is shown here (click to open large version):
It contains about 260,000 separate results for competitors at all events since 1896, including both summer and winter Olympics. It has all the results: not only the medals but those who competed and did not win a medal. All events for each sport are included.
What do all of these tables and columns mean?
This table contains a list of all sports in the Olympics: both summer sports and winter sports.
The event table contains all of the different events for each sport. For example, if a sport is “Alpine Skiing”, then an event is “1KM Men’s Alpine Skiing”. There are multiple events per sport, and events are split into Men, Women, and Mixed.
This represents a list of many of the cities in the world.
The games table lists all of the Olympic Games since 1896m, the year they were held, and whether they were Summer or Olympic games.
Looking at this data, I learned that the Summer and Winter Olympics used to be in the same year. I’ve only ever known them to be in alternating years.
This table shows all of the cities for each Olympic games. This is a joining table between games and city.
Why do we need a joining table? Aren’t the Olympics always in one city?
Well, as I was compiling this data, I got some duplicate rows in my results. I traced this back to a particular set of events for one Olympics. It turns out that in 1956, the Summer Olympics was in Melbourne for most of the events, but due to Australia’s strict horse quarantine rules, they couldn’t bring the horses for the equestrian events. So they performed all of the Equestrian events in Stockholm earlier in 1956.
So, the 1956 Olympics was in both Melbourne and Stockholm.
I didn’t realise this was the case until I looked at the data.
It’s a good lesson: we shouldn’t assume we know how data is structured until we look at the details.
This contains a list of NOC (National Olympic Committee) codes and their names. This translates roughly to a country that competes in the Olympics.
This table lists all people that have competed in an Olympic games. It has their name, gender, height (in CM) and weight (in KG). The height and weight did not differ between Olympic games. If it did, it would have been stored in a different table.
This is a joining table that lists all people and the NOC regions (countries) they competed for. This captures the fact that some people competed for more than one country, which is another scenario I didn’t realise until I looked at the data.
This table is a joining table that relates a person to an Olympic games, which shows who competed what each Olympic games.
This small table lists the different medals available: Gold, Silver, Bronze, and N/A.
This table lists the combination of competitors (the people and the games they competed in), the event they competed in, and the medal (if any) they received. This is the largest table.
There were a few things I had to consider when designing this database, and some were mentioned earlier:
- It’s possible for an Olympic games to be held in two different cities (e.g. Melbourne and Stockholm in 1956).
- A person can compete for two different countries in two different Olympics.
- A person’s height and weight may change between Olympics in reality, but in the data I had, they did not change.
I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.
The sample data is available for Oracle, SQL Server, MySQL, and Postgres, and is stored on my GitHub repository. Find out how to access it and load the data here: Sample Data for SQL Databases
With the sample data in the database, let’s take a look at a sample query. This query shows the people who have won the most medals of any type (Gold, Silver, Bronze) at the Olympic Games.
SELECT p.id, p.full_name, COUNT(*) FROM person p INNER JOIN games_competitor gc ON p.id = gc.person_id INNER JOIN competitor_event ce ON gc.id = ce.competitor_id INNER JOIN medal m ON ce.medal_id = m.id WHERE medal_name IN ('Gold', 'Silver', 'Bronze') GROUP BY p.id, p.full_name ORDER BY COUNT(*) DESC;
Results (top 20):
|94406||Michael Fred Phelps, II||28|
|67046||Larysa Semenivna Latynina (Diriy-)||18|
|4198||Nikolay Yefimovich Andrianov||15|
|109161||Borys Anfiyanovych Shakhlin||13|
|11951||Ole Einar Bjrndalen||13|
|85286||Aleksey Yuryevich Nemov||12|
|70965||Ryan Steven Lochte||12|
|87390||Paavo Johannes Nurmi||12|
|119922||Jennifer Elisabeth “Jenny” Thompson (-Cumpelik)||12|
|23426||Natalie Anne Coughlin (-Hall)||12|
|121258||Dara Grace Torres (-Hoffman, -Minas)||12|
|89706||Carl Townsend Osburn||11|
|11642||Matthew Nicholas “Matt” Biondi||11|
|18826||Vra slavsk (-Odloilov)||11|
|21402||Viktor Ivanovych Chukarin||11|
|113912||Mark Andrew Spitz||11|
There are all kinds of other queries you can write on this data.
So that’s the sample database for the Olympic Games. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.