Do you want to practice SQL on something other than the basic default database? Here’s a sample ERD and data for a superhero database.
When you’re learning SQL or database design, it’s helpful to use other databases as a reference.
Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.
Which is where this post comes in.
This post describes a sample database containing data about superheroes. It includes:
- An ERD (entity relationship diagram) for the sample superhero database
- An explanation of the tables and columns
- A download of sample data to create and populate this database
- An example query on the database
Why is this helpful? Firstly, you can understand more about how a superhero database might work.
Also, you can practice SQL against realistic data and write your own queries, both simple (which superheroes were created by Marvel) and complex (which superpowers are more prevalent in DC than in Marvel).
So, let’s take a look at the database.
Sample Superhero Database: ERD
The ERD for the superhero database is shown here (open in new tab to see a larger version):
It contains about 700 superhero entries from several online sources. Superheroes also have a list of attributes (such as Intelligence and Combat) and a set of superpowers.
However, due to the nature of the source data, not all superheroes have attribute values or superpowers. It’s not ideal, but it’s good practice to deal with incomplete data and working with NULL values.
What do all of these tables and columns mean?
This is the main list of superheroes. It has their superhero name, full name/real name, a list of IDs linking to other tables, and their height (in centimetres) and weight (in kilograms).
The Gender table lists three rows: Male, Female, and N/A.
The Colour table contains references for eye colour, skin colour, and hair colour. There’s a row that indicates “no colour” as well.
This table shows values for the different races of superheroes, such as Human.
The Publisher table lists all of the publishers. It includes the two most common: DC and Marvel. It also includes other publishers such as the “Heroes” TV show and Star Trek.
This table shows three values that demonstrate how the superhero is aligned: Good, Neutral, and Bad.
This table lists six different attributes (e.g. intelligence) that can describe a superhero.
This table is the intersection of attributes and superheroes. A superhero can have multiple attributes, and for each attribute, they have a value from 0-100 indicating their rating. This table contains those ratings (in the attribute_value column).
The superpower table lists all of the available abilities or superpowers that someone can have.
This table lists the IDs of each power that each superhero can have because a superhero can have more than one superpower.
There were a few things I had to consider when designing this database:
- Not all superheroes have attribute values
- Not all superheroes have superpower values
- Some superheroes are mentioned multiple times, such as Green Lantern who has had different people take on the role of Green Lantern.
- Many pieces of information were left out of this data set to keep it simple, such as alias names, birthplaces, and family members.
I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.
The sample data is available for Oracle, SQL Server, MySQL, and Postgres, and is stored on my GitHub repository. Find out how to access it and load the data here: Sample Data for SQL Databases
With the sample data in the database, let’s take a look at a sample query. This query shows all of the superheroes and the sum of their attributes, including those with no attributes.
SELECT s.id, s.superhero_name, p.publisher_name, SUM(ha.attribute_value) AS total_attributes FROM superhero s INNER JOIN publisher p ON s.publisher_id = p.id LEFT JOIN hero_attribute ha ON s.id = ha.hero_id LEFT JOIN attribute a ON ha.attribute_id = a.id GROUP BY s.id, s.superhero_name, p.publisher_name ORDER BY SUM(ha.attribute_value) DESC;
Results (top 20):
|284||General Zod||DC Comics||600|
|541||Power Girl||DC Comics||600|
|441||Man of Miracles||Image Comics||600|
|683||The Presence||DC Comics||600|
|424||Living Tribunal||Marvel Comics||600|
|82||Ben 10||DC Comics||590|
|474||Mister Mxyzptlk||DC Comics||585|
|622||Silver Surfer||Marvel Comics||580|
There are all kinds of other queries you can write on this data.
So that’s the sample database for superheroes. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.