FB

In this post, you’ll learn how to set up a sample database so you can practice your SQL queries.

All of the SQL scripts are available for download, and they work on the following databases:

  • Oracle
  • SQL Server
  • MySQL
  • PostgreSQL

(You may be able to tweak them to work on others not mentioned such as MariaDB, DB2 or SQLite).

Let’s get started.

Prerequisites

You won’t need much to get started with this data set.

Access to an Oracle, SQL Server, MySQL, or PostgreSQL database. You can get this by:

  • setting up a database on your computer directly (e.g. Oracle)
  • using Docker images (instructions for Oracle and SQL Server, with MySQL and PostgreSQL coming soon)
  • Use a cloud database (e.g. Oracle cloud, Azure, GCP, AWS)

An IDE to run SQL scripts. You can use any IDE you like (and I’ve got a full list here), but some common combinations are:

  • Oracle SQL Developer + Oracle database
  • SQL Server Management Studio + SQL Server database
  • MySQL Workbench + MySQL database
  • pgAdmin + PostgreSQL database
  • DBeaver + PostgreSQL database

A basic understanding of SQL. You’ll need to know what a Create statement and Insert statement do, but other than that, you should be able to follow these steps.

 

Database Diagram (ERD)

Here’s the ERD of the sample bookstore database we’ll be looking at.

erd_gravity

This contains the following tables:

  • book: a list of all books available in the store.
  • book_author: stores the authors for each book, which is a many-to-many relationship.
  • author: a list of all authors.
  • book_language: a list of possible languages of books.
  • publisher: a list of publishers for books.
  • customer: a list of the customers of the Gravity Bookstore.
  • customer_address: a list of addresses for customers, as a customer can have more than one address, and an address has more than one customer.
  • address_status: a list of statuses for an address, because addresses can be current or old.
  • address: a list of addresses in the system.
  • country: a list of countries that addresses are in.
  • cust_order: a list of orders placed by customers.
  • order_line: a list of books that are a part of each order.
  • shipping_method: the possible shipping methods for an order.
  • order_history: the history of an order, such as ordered, cancelled, delivered.
  • order_status: the possible statuses of an order.

 

SQL Scripts

All of the scripts to load this database are SQL files. We run them one after the other on the database to create the tables and load the data.

Step 1: Download the SQL scripts for the database you’re working with.

You can download the SQL files for each database as ZIP files:

You can use this script to drop all of the tables if you need to start again:

Step 2: Extract the ZIP file to a folder on your computer.

This folder should now have 13 different SQL scripts.

Now we can connect to the database.

 

Connect To Your Database

After you have downloaded the scripts, the next step is to connect to the database.

This will involve opening your IDE and connecting.

Step 3: Open your IDE.

The exact IDE will depend on your database and personal preference, but for this guide, we’ll use MySQL Workbench with a MySQL database.

Step 4: Connect to your database.

This will involve creating a new connection or using an existing connection. Setting this up is outside the scope of this article, but should be done as part of the “setting up your database” prerequisites.

Now, let’s start loading the data in.

You’ve got your IDE open and the SQL scripts downloaded.

We now need to run each of the scripts, one after the other, in numerical order, from 01 to 13.

The following sections will explain a little about each script.

 

Run Script 00 (Oracle Only) to Create a New User

If you’re running this on an Oracle database, you’ll also have a file called 00_oracle_user.sql.

This file will create a new user on your database. This is so you can store your tables and data under that user, instead of the default SYS or SYSTEM user.

If you’re not using Oracle, then skip to the next section (Script 01).

Step 5: Open Script 00.

It will look something like this:

CREATE USER gravity IDENTIFIED BY mypassword;
GRANT CONNECT TO gravity;
GRANT CREATE SESSION, GRANT ANY PRIVILEGE TO gravity;
GRANT UNLIMITED TABLESPACE TO gravity;
GRANT CREATE TABLE TO gravity;
GRANT CREATE SEQUENCE TO gravity;

Step 6: Run the script on the database.

This will create a new user called gravity, with a password of “mypassword”.

Make sure you’re connected to the Pluggable Database (PDB) not the Container Database (CDB). This can be confusing when getting started.

Step 7: Create a new connection using this new user.

In your IDE, create a new connection to the database using the gravity username and password of “mypassword”.

Step 8: Connect to the database.

Using the new connection you made, connect to the database.

You should now be connected to the Oracle database as the gravity user.

 

Run Script 01 to Create the Tables

Step 9: Open script 1.

The filename will depend on which database you’re working with, so it could be:

  • 01_mysql_create.sql
  • 01_oracle_create.sql
  • 01_postgres_create.sql
  • 01_sqlserver_create.sql

Script 1 is used to create all of the tables needed in the bookstore.

It will look something like this:

USE gravity_books;
CREATE TABLE author (
  author_id INT,
  author_name VARCHAR(400),
  CONSTRAINT pk_author PRIMARY KEY (author_id)
);

CREATE TABLE publisher (
  publisher_id INT,
  publisher_name VARCHAR(400),
  CONSTRAINT pk_publisher PRIMARY KEY (publisher_id)
);
...

Step 10: Run script 1.

Once the script is run, all of the tables in this data set will be created. They won’t have any data in them, but we’ll add data in the next few steps.

Note: If at any point during the scripts you need to re-run them, you can drop the tables that are created in this script and run them again.

 

Run Script 02 to Populate the Author

Step 11: Open script 2.

The filename will depend on which database you’re working with, so it could be:

  • 02_mysql_populate_author.sql
  • 02_oracle_populate_author.sql
  • 02_postgres_populate_author.sql
  • 02_sqlserver_populate_author.sql

Script 2 is used to populate the author table. It looks something like this:

INSERT INTO author (author_name, author_id) VALUES
('A. Bartlett Giamatti', 1),
('A. Elizabeth Delany', 2),
('A. Merritt', 3),
('A. Roger Merrill', 4),
('A. Walton Litz', 5),
...

Oracle’s version of the script uses an INSERT ALL syntax, as Oracle does not support inserting multiple rows like the other databases do.

Step 12: Run script 2.

Run this script on your database and the author table should be populated.

 

Run Script 03 to Populate the Publisher

Step 13: Open script 3.

Once again, the filename will be different depending on your database, but it should start with 03:

  • 03_mysql_populate_publisher.sql
  • 03_oracle_populate_publisher.sql
  • 03_postgres_populate_publisher.sql
  • 03_sqlserver_populate_publisher.sql

The script looks like this:

INSERT INTO publisher (publisher_name, publisher_id) VALUES
('10/18', 1),
('1st Book Library', 2),
('1st World Library', 3),
('A & C Black (Childrens books)', 4),
('A Harvest Book/Harcourt Inc.', 5),
('A K PETERS', 6),
...

Step 14: Run script 3.

This will populate the publisher table in your database.

 

Run Script 04 to Populate the Lookup Tables

Step 15: Open script 4.

Script 4 is about populating some of the lookup tables. The filenames differ for each database:

  • 04_mysql_populate_lookups.sql
  • 04_oracle_populate_lookups.sql
  • 04_postgres_populate_lookups.sql
  • 04_sqlserver_populate_lookups.sql

The file looks like this:

INSERT INTO book_language (language_code, language_id, language_name) VALUES
('eng', 1, 'English'),
('en-US', 2, 'United States English'),
('fre', 3, 'French'),
('spa', 4, 'Spanish'),
('en-GB', 5, 'British English'),
...

The script inserts data into the book_language, shipping_method, address_status, and order_status tables.

Step 16: Run script 4.

Once you run script 4, these tables will be populated.

 

Run Script 05 to Populate the Books

Step 17: Open script 5.

Script 5 is all about populating the book table. The filenames look like this:

  • 05_mysql_populate_book.sql
  • 05_oracle_populate_book.sql
  • 05_postgres_populate_book.sql
  • 05_sqlserver_populate_book.sql

The script looks like this:

INSERT INTO book (title, book_id, isbn13, language_id, num_pages, publication_date, publisher_id) VALUES
('The World''s First Love: Mary  Mother of God', 1, '8987059752', 2, 276, '1996-09-01', 1010),
('The Illuminati', 2, '20049130001', 1, 352, '2004-10-04', 1967),
('The Servant Leader', 3, '23755004321', 1, 128, '2003-03-11', 1967),
('What Life Was Like in the Jewel in the Crown: British India  AD 1600-1905', 4, '34406054602', 1, 168, '1999-09-01', 1978),
...

It’s a long script as there are a lot of rows inserted (over 11,000).

Step 18: Run script 5.

This will insert all of the records into the book table.

 

Run Script 06 to Populate Book Author

Step 19: Open script 6.

Script 6 is used to populate the book_author table, which is the joining table between book and author. It stores the combinations of books and authors, as a book can have many authors and an author can have many books.

Related: Database Design Many to Many (YouTube)

The filenames look like this:

  • 06_mysql_populate_bookauthor.sql
  • 06_oracle_populate_bookauthor.sql
  • 06_postgres_populate_bookauthor.sql
  • 06_sqlserver_populate_bookauthor.sql

The script looks like this:

INSERT INTO book_author (book_id, author_id) VALUES
(1570, 2823),
(10539, 6439),
(8323, 7011),
(7344, 3760),
...

There are over 17,000 rows in this script.

Step 20: Run script 6.

Once the script is run, the book_author table will be populated.

 

Run Script 07 to Populate Country

Step 21: Open script 7.

Script 7 populates the country table. The scripts are called:

  • 07_mysql_populate_country.sql
  • 07_oracle_populate_country.sql
  • 07_postgres_populate_country.sql
  • 07_sqlserver_populate_country.sql

The scripts look like this:

INSERT INTO country (country_id, country_name) VALUES
(1, 'Afghanistan'),
(2, 'Netherlands Antilles'),
(3, 'Albania'),
(4, 'Algeria'),
(5, 'Andorra'),
(6, 'Angola'),
...

There are just over 200 records inserted into this table.

Step 22: Run script 7.

Once you run the script, the country table is populated.

 

Run Script 08 to Populate Address

Step 23: Open script 8.

Script 8 populates the address table. The scripts are called:

  • 08_mysql_populate_address.sql
  • 08_oracle_populate_address.sql
  • 08_postgres_populate_address.sql
  • 08_sqlserver_populate_address.sql

The scripts look like this:

INSERT INTO address (address_id, street_number, street_name, city, country_id) VALUES
(1, '57', 'Glacier Hill Avenue', 'Torbat-e Jām', 95),
(2, '86', 'Dottie Junction', 'Beaumont', 37),
(3, '292', 'Ramsey Avenue', 'Cayambe', 60),
(4, '5618', 'Thackeray Junction', 'Caldas', 47),
(5, '4', '2nd Park', 'Ngunguru', 153),
...

There are 1,000 records inserted into this table. They are mock addresses: generated using a “mock data” generation service.

Step 24: Run script 8.

Once you run the script, the address table is populated.

 

Run Script 09 to Populate Customer

Step 25: Open script 9.

Script 9 is used to populate the customer table.

Once again, the filenames are different depending on the database:

  • 09_mysql_populate_customer.sql
  • 09_oracle_populate_customer.sql
  • 09_postgres_populate_customer.sql
  • 09_sqlserver_populate_customer.sql

The script looks like this:

INSERT INTO customer (customer_id, first_name, last_name, email) VALUES
(1, 'Ursola', 'Purdy', '[email protected]'),
(2, 'Ruthanne', 'Vatini', '[email protected]'),
(3, 'Reidar', 'Turbitt', '[email protected]'),
(4, 'Rich', 'Kirsz', '[email protected]'),
(5, 'Carline', 'Kupis', '[email protected]'),
(6, 'Kandy', 'Adamec', '[email protected]'),
(7, 'Jermain', 'Giraudeau', '[email protected]s.com'),
(8, 'Nolly', 'Bonicelli', '[email protected]'),
...

Step 26: Run script 9.

This will insert 2,000 records into the customer table.

 

Run Script 10 to Populate Customer Address

Step 27: Open script 10.

This script will populate the customer_address table. The filenames are:

  • 10_mysql_populate_others.sql
  • 10_oracle_populate_others.sql
  • 10_postgres_populate_others.sql
  • 10_sqlserver_populate_others.sql

The script looks like this:

/* Insert all 2000 customers */
INSERT INTO customer_address (customer_id, address_id, status_id)
SELECT c.customer_id,
       (SELECT address_id
    FROM address
    ORDER BY RAND()
    LIMIT 1),
    1
FROM customer c;

/* Add a second active address for 750 of those customers */
INSERT INTO customer_address (customer_id, address_id, status_id)
SELECT customer_id, address_id, 1
FROM (
         SELECT ca.customer_id,
                (SELECT address_id
                 FROM address
                 ORDER BY RAND()
                 LIMIT 1) AS address_id
         FROM customer_address ca
         ORDER BY RAND()

     ) sub
WHERE NOT EXISTS (
    SELECT 1
    FROM customer_address c
    WHERE c.customer_id = sub.customer_id
    AND c.address_id = sub.address_id
    )
LIMIT 750;

It uses several different queries to insert a specific number of random rows into the table. This is much more efficient and results in a smaller file size than having a long INSERT query where the rows are specified.

You can change the LIMIT clauses in some of the statements to add more or fewer rows, but it will run fine using the default values.

Step 28: Run script 10.

The script will insert a few thousand rows into the customer_address table.

 

Run Script 11 to Populate Customer Orders

Step 29: Open script 11.

This script will populate the cust_order table. The scripts, like others, differ depending on the database used:

  • 11_mysql_populate_order.sql
  • 11_oracle_populate_order.sql
  • 11_postgres_populate_order.sql
  • 11_sqlserver_populate_order.sql

The script looks like this:

/* Add 2500 orders */
INSERT INTO cust_order (order_date, customer_id, shipping_method_id, dest_address_id)
SELECT
    NOW() - INTERVAL FLOOR(RAND() * 365 * 3 * 24 * 60 * 60) SECOND AS order_date,
    c.customer_id,
    FLOOR(1 + RAND() * 4) AS shipping_method_id,
    ca.address_id
FROM customer c
INNER JOIN customer_address ca ON c.customer_id = ca.customer_id
LIMIT 2500;
...

The script uses subqueries and the LIMIT clause to insert a randomised list of records into the cust_order table.

Step 30: Run script 11.

This will insert a few thousand rows into the cust_order table.

 

Run Script 12 to Populate Order Lines

Step 31: Open script 12.

Script 12 is about populating the order line table:

  • 12_mysql_populate_orderline.sql
  • 12_oracle_populate_orderline.sql
  • 12_postgres_populate_orderline.sql
  • 12_sqlserver_populate_orderline.sql

The script looks like this:

/* Add 2500 orders */
INSERT INTO cust_order (order_date, customer_id, shipping_method_id, dest_address_id)
SELECT
    NOW() - INTERVAL FLOOR(RAND() * 365 * 3 * 24 * 60 * 60) SECOND AS order_date,
    c.customer_id,
    FLOOR(1 + RAND() * 4) AS shipping_method_id,
    ca.address_id
FROM customer c
INNER JOIN customer_address ca ON c.customer_id = ca.customer_id
LIMIT 2500;
...

 

Just like some of the earlier steps, it uses the LIMIT clause and random functions to insert a specified number of random rows into the order_line table.

You can change the LIMIT parameters to insert more or fewer rows if you like.

Step 32: Run script 12.

Running this script will insert 7,850 records into the order_line table.

 

Run Script 13 to Populate Order History

Step 33: Open script 13.

The final script, script 13, will add records into the order_history table.

The script names are:

  • 13_mysql_populate_orderhistory.sql
  • 13_oracle_populate_orderhistory.sql
  • 13_postgres_populate_orderhistory.sql
  • 13_sqlserver_populate_orderhistory.sql

The script looks like this:

/* Add order received for all orders older than 12 hours
#Total 7547 orders*/
INSERT INTO order_history (order_id, status_id, status_date)
SELECT
o.order_id,
1 AS status_id,
o.order_date + INTERVAL FLOOR(RAND() * 60 * 60 * 12) SECOND AS status_date
FROM cust_order o
WHERE o.order_date < NOW() - INTERVAL 12 HOUR;
...

The LIMIT clause in some of the queries can be adjusted to add more or less records.

Step 34: Run script 13

The script inserts just over 20,000 records into the table.

 

Once you’ve run script 13, all of the data is now in the tables. You can now run other queries on the tables to analyse the data.

 

Conclusion

This data set was generated for a bookstore, and by downloading and running the SQL files, you can get the dataset loaded onto your own computer.

Do you have any issues loading the data, or any questions? Let me know in the comments below.

Share via
Copy link
Powered by Social Snap