Do you want to practice SQL on something other than the basic default database? Here’s a sample ERD and data for a superhero database.
When you’re learning SQL or database design, it’s helpful to use other databases as a reference.
Many articles online refer to Oracle’s HR database, or SQL Server’s AdventureWorks database. These can be helpful, but often you’re looking for answers that these databases don’t help with.
Which is where this post comes in.
This post describes a sample database containing data about superheroes. It includes:
- An ERD (entity relationship diagram) for the sample superhero database
- An explanation of the tables and columns
- A download of sample data to create and populate this database
- An example query on the database
Why is this helpful? Firstly, you can understand more about how a superhero database might work.
Also, you can practice SQL against realistic data and write your own queries, both simple (which superheroes were created by Marvel) and complex (which superpowers are more prevalent in DC than in Marvel).
So, let’s take a look at the database.
Sample Superhero Database: ERD
The ERD for the superhero database is shown here (open in new tab to see a larger version):
It contains about 700 superhero entries from several online sources. Superheroes also have a list of attributes (such as Intelligence and Combat) and a set of superpowers.
However, due to the nature of the source data, not all superheroes have attribute values or superpowers. It’s not ideal, but it’s good practice to deal with incomplete data and working with NULL values.
What do all of these tables and columns mean?
Table Explanations
Superhero
This is the main list of superheroes. It has their superhero name, full name/real name, a list of IDs linking to other tables, and their height (in centimetres) and weight (in kilograms).
Gender
The Gender table lists three rows: Male, Female, and N/A.
Colour
The Colour table contains references for eye colour, skin colour, and hair colour. There’s a row that indicates “no colour” as well.
Race
This table shows values for the different races of superheroes, such as Human.
Publisher
The Publisher table lists all of the publishers. It includes the two most common: DC and Marvel. It also includes other publishers such as the “Heroes” TV show and Star Trek.
Alignment
This table shows three values that demonstrate how the superhero is aligned: Good, Neutral, and Bad.
Attribute
This table lists six different attributes (e.g. intelligence) that can describe a superhero.
Hero_Attribute
This table is the intersection of attributes and superheroes. A superhero can have multiple attributes, and for each attribute, they have a value from 0-100 indicating their rating. This table contains those ratings (in the attribute_value column).
Superpower
The superpower table lists all of the available abilities or superpowers that someone can have.
Hero_Power
This table lists the IDs of each power that each superhero can have because a superhero can have more than one superpower.
Design Comments
There were a few things I had to consider when designing this database:
- Not all superheroes have attribute values
- Not all superheroes have superpower values
- Some superheroes are mentioned multiple times, such as Green Lantern who has had different people take on the role of Green Lantern.
- Many pieces of information were left out of this data set to keep it simple, such as alias names, birthplaces, and family members.
Sample Data
I’ve prepared some sample data for this database. You can use this to create this database on your own computer, explore the tables, and write SQL on it.
The sample data is available for Oracle, SQL Server, MySQL, and Postgres, and is stored on my GitHub repository. Find out how to access it and load the data here: Sample Data for SQL Databases
Sample Query
With the sample data in the database, let’s take a look at a sample query. This query shows all of the superheroes and the sum of their attributes, including those with no attributes.
SELECT
s.id,
s.superhero_name,
p.publisher_name,
SUM(ha.attribute_value) AS total_attributes
FROM superhero s
INNER JOIN publisher p ON s.publisher_id = p.id
LEFT JOIN hero_attribute ha ON s.id = ha.hero_id
LEFT JOIN attribute a ON ha.attribute_id = a.id
GROUP BY s.id, s.superhero_name, p.publisher_name
ORDER BY SUM(ha.attribute_value) DESC;
Results (top 20):
# id | superhero_name | publisher_name | total_attributes |
284 | General Zod | DC Comics | 600 |
541 | Power Girl | DC Comics | 600 |
441 | Man of Miracles | Image Comics | 600 |
518 | One-Above-All | Marvel Comics | 600 |
665 | Superboy-Prime | DC Comics | 600 |
666 | Supergirl | DC Comics | 600 |
84 | Beyonder | Marvel Comics | 600 |
667 | Superman | DC Comics | 600 |
683 | The Presence | DC Comics | 600 |
424 | Living Tribunal | Marvel Comics | 600 |
28 | Angela | Image Comics | 595 |
348 | Hyperion | Marvel Comics | 595 |
637 | Spectre | DC Comics | 595 |
91 | Binary | Marvel Comics | 595 |
484 | Monarch | DC Comics | 595 |
82 | Ben 10 | DC Comics | 590 |
604 | Sentry | Marvel Comics | 590 |
43 | Ardina | Marvel Comics | 585 |
474 | Mister Mxyzptlk | DC Comics | 585 |
622 | Silver Surfer | Marvel Comics | 580 |
There are all kinds of other queries you can write on this data.
Conclusion
So that’s the sample database for superheroes. There’s an ERD you can use to help you understand it or to design your own. You can also download the sample database tables and data to run your own queries on it.
is it possible to get dataset in postgres format
when run sample_data_superheroes.sql For Oracle
get the errors:
ORA-00928: missing SELECT keyword
At:
Insert All
into
in Oracle
what is the reason? How to fix?
Thanks,
James