FB

A self join is a common solution to a specific table design in databases. Learn what a self join is and how to write queries against it in this article.

What Is a Self Join?

A self join is a type of join in a database where you join a table to itself. It’s necessary in some designs where a hierarchy exists within a table.What Is a Self Join

 

Why is a Self Join Needed?

The main reason you would want to do a self join is where you need records from a table and there is information in that table for one record and its related record.

This is common if your data is set up in a hierarchy.

We could use a subquery, but using a self join is a more efficient method.

Get Your Free PDF: 9 Ways to Improve your Database Skills

 

Why Not Use Separate Tables?

You might be wondering, why are we storing this information in the same table? Why can’t we use separate tables?

We could, but this could be a problem.

Let’s say we have an employee table and a manager table, where the manager table looks like the employee table. The manager table contains records for managers.

What if the manager has a manager? Do we create a third table to store those people, and call the table “general manager” or whatever the role is in the company?

How far up the hierarchy do we need to go?

What if there are 6 levels in your organisation?

What if some departments don’t have all levels? For example, the Sales department might have 5 levels, which means all employees will be in the manager table instead of the employee table?

What if the organisation roles or number of levels changes? There would be significant changes to your table structure.

So, where there is any hierarchy of data, it’s better to use a single table and have the child row refer to the parent row.

It’s also against the rules of normalisation to create a second copy of the same table.

 

Example of a Self Join

I’ve mentioned a table having a hierarchy in this article, but what does that mean?

Let’s take a look at an example.

This example is a common one that’s used when explaining self joins and hierarchies.

Let’s assume we have an employee table:

EMPLOYEE_ID FIRST_NAME LAST_NAME DEPARTMENT_ID MANAGER_ID
1 Michelle Foster 8 162
2 Cheryl Turner 3 99
3 Carolyn Hudson 7 199
4 Patrick Berry 3 159
5 Doris Powell 1  (null)

Now, in this table, the manager_id column refers to a different record in the employee table, which represents that employee’s manager.

So, employee ID 1 has a manager id of 162, which refers to employee ID 162. Employee ID 2’s manager is employee ID 99.

How can we find a list of all employee names and their manager’s names?

We do this by writing a query that has a join to itself.

SELECT
e.employee_id,
e.first_name,
e.last_name,
e.department_id,
e.manager_id,
m.employee_id,
m.first_name,
m.last_name,
m.department_id,
m.manager_id
FROM employee e
INNER JOIN employee m ON e.manager_id = m.employee_id;

The results of this query are:

EMPLOYEE_ID FIRST_NAME LAST_NAME DEPARTMENT_ID MANAGER_ID EMPLOYEE_ID_1 FIRST_NAME_1 LAST_NAME_1 DEPARTMENT_ID_1 MANAGER_ID_1
2 Cheryl Turner 3 99 99 Mark Jackson 6 57
4 Patrick Berry 3 159 159 Gary Knight 5 16
1 Michelle Foster 8 162 162 Kimberly Mendoza 3 191
3 Carolyn Hudson 7 199 199 Lawrence Henderson 6 47

You can see here that we have selected from the employee table twice. We have given the first table an alias of “e”, which is to represent the employee. Then, we give the second employee table an alias of “m”, which represents the manager.

Then, when we join the tables together, we have joined on different fields.

We have joined them on the employee’s manager id being equal to the manager’s employee id.

This is the key part of the self join. There needs to be a field in the table that refers to another field in the table.

Also, the tables need to have aliases. This is so Oracle (and us as the developer) knows which employee table to refer to when performing the join and displaying the columns.

In this query, we’re showing the employee’s information first, then the manager’s information.

The columns on the right half of the table have a “_1” after the headings. This is because columns with those names were already being returned from another table, so Oracle adds a “_1” to them. You could use column aliases to rename them.

We can see that there are 4 records here, and each employee has their manager information showing.

However, there are some records not showing. The records not being shown are those that have a manager_id of NULL, where they have no manager.

To display these records, we can change the query to use a LEFT JOIN.

SELECT
e.employee_id,
e.first_name,
e.last_name,
e.department_id,
e.manager_id,
m.employee_id,
m.first_name,
m.last_name,
m.department_id,
m.manager_id
FROM employee e
LEFT JOIN employee m ON e.manager_id = m.employee_id;

EMPLOYEE_ID FIRST_NAME LAST_NAME DEPARTMENT_ID MANAGER_ID EMPLOYEE_ID_1 FIRST_NAME_1 LAST_NAME_1 DEPARTMENT_ID_1 MANAGER_ID_1
2 Cheryl Turner 3 99 99 Mark Jackson 6 57
4 Patrick Berry 3 159 159 Gary Knight 5 16
1 Michelle Foster 8 162 162 Kimberly Mendoza 3 191
3 Carolyn Hudson 7 199 199 Lawrence Henderson 6 47
5 Doris Powell 1  (null)  (null)  (null)  (null)  (null)  (null)

Now we can see all employees, whether they have a manager or not.

Let’s remove some of the columns we don’t need.

SELECT

e.employee_id,

e.first_name,

e.last_name,

e.department_id,

m.employee_id,

m.first_name,

m.last_name

FROM employee e

LEFT JOIN employee m ON e.manager_id = m.employee_id;

 

EMPLOYEE_ID FIRST_NAME LAST_NAME DEPARTMENT_ID EMPLOYEE_ID_1 FIRST_NAME_1 LAST_NAME_1
2 Cheryl Turner 3 99 Mark Jackson
4 Patrick Berry 3 159 Gary Knight
1 Michelle Foster 8 162 Kimberly Mendoza
3 Carolyn Hudson 7 199 Lawrence Henderson
5 Doris Powell 1  (null)  (null)  (null)

Now it shows the same records but less data.

Let’s look at a few things you need to keep in mind when working with self joins.

 

Things to Remember when Using Self Joins

There are a few things to remember when you’re writing queries that use self joins.

Both tables must have an alias

In your query, both of your tables must have an alias.

Oracle needs to know how to process each of the tables and to get the data from the right table. I often use an alias that describes what the table is representing (e.g. e for employee and m for manager, mentioned above)

Use table aliases in your SELECT clause

You’ll also need to use your table aliases in your SELECT clause. This is because you’re selecting data from two instances of the same table, and Oracle needs to know which table you’re referring to.

For example, if you say employee_name, are you referring to the employee’s employee_name or the manager’s employee_name? Using a table alias next to each of these columns (e.g. e.employee_name or m.employee_name) will avoid any errors.

Confirm your join fields

Unlike other joins where your columns are likely to be the same (using a primary key and foreign key relationship), a self join uses joins on different fields.

If you join on the same field, you won’t get the same results.

 

Other Self Join Examples

Some other situations that you may require a self join include:

  • Parents and children
  • Departments within a company
  • Locations within other locations (e.g. city, suburb, county, state)
  • Subject categories or groups for a school or education system
  • Parts that contain other parts
  • Many more

 

So, that’s what a self join is and how you write queries in SQL for self joins.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!

Get Your Free PDF: 9 Ways to Improve your Database Skills

Copyright: dariusl / 123RF Stock Photo

Get Your SQL Function Cheat Sheet Now: