FB

Do you want to practice SQL on something other than the basic default database? Here’s a sample ERD and data for a university ranking database.

When you’re learning SQL or database design, it’s helpful to use other databases as a reference.

Many articles online refer to Oracle’s HR database or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.

Which is where this post comes in.

This post describes a sample database containing data about universities. It includes:

  • An ERD (entity relationship diagram) for the sample university ranking database
  • An explanation of the tables and columns
  • A download of sample data to create and populate this database
  • An example query on the database

Why is this helpful? Firstly, you can understand more about how a university ranking database might work.

Also, you can practice SQL against realistic data and write your own queries, both simple (universities exist in which country) and complex (which universities had the largest ranking change from one year to the next).

So, let’s take a look at the database.

 

Sample University Ranking Database: ERD

The ERD for the university rankings database is shown here:

universities_erd

There are a few hundred universities in the sample data, and 21 different ranking criteria across three different systems. The data exists for the years 2005 to 2016.

It doesn’t contain every world university, or data up until 2018 or 2019, but it’s a good set to work with.

What do all of these tables and columns mean?

 

Table Explanations

Let’s look at what the tables and columns mean.

country

This is a simple list of countries that were in the data set. I don’t think it’s a complete list of all countries, but it’s enough for this data set.

university

A list of universities that were ranked in this system. Each university is related to a country.

ranking_system

This contains the three different ranking systems used: Times Higher Education World University Ranking, Shanghai Ranking, and Center for World University Rankings.

ranking_criteria

This table contains the different criteria used in each ranking system, such as Citations and Quality of Education. It also contains criteria for Total Score for each system.

university_year

The university_year table contains values for measurements such as the number of students and the student to staff ratio for each university in several years.

university_ranking_year

This table contains the score for each year for each university and ranking criteria. It’s the largest table in the database.

 

Design Comments

There are a few things to consider when working with this data:

  • Because the data is normalised, you may want to use a PIVOT or subqueries to get scores in a column format, if that’s what you prefer.
  • The measurements in the univeristy_year table could be moved to the ranking_criteria table, as it seems to be storing similar data. But it’s separated as it’s not really related to a ranking criteria.

 

Sample Data

I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.

This file is set up to run on MySQL.

The file is available here:

sample_data_universities.sql (586 KB)

It’s a single SQL file that creates the university ranking database, creates the tables, and populates them.

You can run the SQL file itself, or use MySQL Workbench’s Data Import feature to import it.

If you’d like to run this on Oracle or SQL Server, enter your email address below and I’ll send the links to the files.

 

Sample Query

With the sample data in the database, let’s take a look at a sample query. This query shows all of the universities and their total scores from 2015

SELECT
u.university_name,
c.country_name,
rc.criteria_name,
ry.year,
ry.score
FROM university u
INNER JOIN country c ON u.country_id = c.id
INNER JOIN university_ranking_year ry ON ry.university_id = u.id
INNER JOIN ranking_criteria rc ON ry.ranking_criteria_id = rc.id
WHERE rc.criteria_name IN (
'Total Times',
'Total Shanghai',
'Total CWUR'
)
AND ry.year = 2015
ORDER BY ry.score DESC;

Results (top 20):

# university_name country_name criteria_name year score
Harvard University United States of America Total Shanghai 2015 100
Harvard University United States of America Total CWUR 2015 100
Stanford University United States of America Total CWUR 2015 99
Massachusetts Institute of Technology United States of America Total CWUR 2015 98
University of Cambridge United Kingdom Total CWUR 2015 97
Columbia University United States of America Total CWUR 2015 96
University of Oxford United Kingdom Total CWUR 2015 96
California Institute of Technology United States of America Total Times 2015 94
University of Oxford United Kingdom Total Times 2015 93
Harvard University United States of America Total Times 2015 93
Stanford University United States of America Total Times 2015 93
University of California, Berkeley United States of America Total CWUR 2015 92
Massachusetts Institute of Technology United States of America Total Times 2015 92
University of Cambridge United Kingdom Total Times 2015 92
University of Chicago United States of America Total CWUR 2015 91
Princeton University United States of America Total Times 2015 91
University of California, Berkeley United States of America Total Times 2015 90
Princeton University United States of America Total CWUR 2015 89
Yale University United States of America Total Times 2015 88
Imperial College London United Kingdom Total Times 2015 88

 

There are all kinds of other queries you can write on this data.

 

Conclusion

So that’s the sample database for university rankings. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.