SQL DISTINCT: The Complete Guide

Jul 24, 2018 · 8 min read · sql ·

The SQL DISTINCT keyword lets you find unique rows.

By the end of this article, you’ll know how to use it, what it can do, and understand several different examples.

What is SQL DISTINCT?

DISTINCT is a keyword in SQL that allows you to show unique or distinct results.

It’s added to a SELECT query to eliminate duplicates in the data it displays because columns often contain duplicate values and sometimes you may only want to show unique or distinct values.

If you want to actually remove duplicate records, read this article on removing duplicates with SQL.

Syntax

The syntax of the DISTINCT keyword is as follows:

1SELECT DISTINCT columns
2FROM table…

You can simply add the word DISTINCT after your SELECT keyword.

You can also add the DISTINCT keyword inside aggregate functions:

1SELECT aggregate_function(DISTINCT column)
2FROM table…

We’ll see some examples of this below.

While you're here, if you want an easy-to-use list of the main features in SQL for different vendors, get my SQL Cheat Sheets here:

Invalid DISTINCT Syntax

One way I’ve seen DISTINCT being used is in the middle of a SELECT statement.

For example:

1SELECT col1, DISTINCT col2, col3
2FROM table…

Perhaps the user is trying to show unique values of a particular column.

Because the DISTINCT keyword applies to the entire result set, and not just one column, the DISTINCT must go at the start of the query, just after SELECT:

1SELECT DISTINCT col1, col2, col3
2FROM table…

I’ll show an example of this later in the guide.

Examples of SQL DISTINCT

Let’s take a look at some examples of the DISTINCT keyword in SQL.

These all use sample data that looks like this:

1CREATE TABLE customer (
2  id NUMBER(10),
3  first_name VARCHAR2(100),
4  last_name VARCHAR2(100),
5  address_state VARCHAR2(2),
6  active_status VARCHAR2(1)
7);

id	first_name	last_name	address_state	active
1	Stephen	Jones	AL	A
2	Mark	Smith	CA	A
3	Denise	King	MO	A
4	Paula	Johnson	NY	D
5	Richard	Archer	OR	A
6	Christina	Jones	NY	D

SQL DISTINCT on One Column

The simplest example of using the DISTINCT keyword is on a single column.

1SELECT DISTINCT last_name
2FROM customer;

Result:

last_name
Jones
Smith
King
Johnson
Archer

As you can see, there are 5 results here. However, there were 6 records in the table. Where did the other record go?

The DISTINCT keyword found that there are two last_name values that were the same: “Jones”. DISTINCT will not show duplicate results, so the second value was not shown.

SQL DISTINCT on One Column

Let’s look at another example of DISTINCT on a single column.

1SELECT DISTINCT active
2FROM customer;

active
A
D

This shows only two records, because there are only two different “active” values in the table.

Do you want to know how many values exist for each different ACTIVE value? If so, you’ll need to use a COUNT and GROUP BY:

1SELECT active, COUNT(*)
2FROM customer
3GROUP BY active;

SQL DISTINCT on Multiple Columns

You can use DISTINCT on multiple columns. The DISTINCT keyword applies to the entire result set, so adding DISTINCT to your query with multiple columns will find unique results.

1SELECT DISTINCT last_name, active
2FROM customer;

last_name	active
Jones	A
Smith	A
King	A
Johnson	D
Archer	A
Jones	D

This query also shows 6 results because there are 6 unique records.

The query will find the last_name and active values for all records. It will then eliminate duplicate records using DISTINCT.

The reason Jones is shown twice is because even though the last names are the same, the record is not unique. It isn’t unique because the active values are different. Jones “A” is different to Jones “D”.

SQL DISTINCT on Multiple Columns

Let’s see another example: the unique Address_State and Active values.

1SELECT DISTINCT address_state, active
2FROM customer;

address_state	active
AL	A
CA	A
MO	A
NY	D
OR	A

This result only shows 5 records, because there were two results with an address_state of “NY” and an active value of “A”.

SQL DISTINCT with COUNT

You can use the DISTINCT keyword with the COUNT function.

Why would you do this?

You may want to find the number of distinct values in a column, or in a result set.

This could be different from the total number of values.

Let’s take a look at an example. First, we’ll count the number of values in the address_state column.

1SELECT COUNT(address_state)
2FROM customer;

Result:

COUNT(address_state)
6

This shows the value of 6, because it actually counts the number of non-NULL values in the address_state column in the table.

What if we want to find the number of distinct value?

1SELECT COUNT(DISTINCT address_state)
2FROM customer;

Result:

COUNT(DISTINCTaddress_state)
5

This shows a value of 5, because there are five unique address_state values in the table. This is the right way to use COUNT and DISTINCT.

What if we put the DISTINCT outside the COUNT function? I’ve done this before and wondered why my query doesn’t work!

1SELECT DISTINCT COUNT(address_state)
2FROM customer;

Result:

COUNT(address_state)
6

This shows 6, which is not the number of distinct address_state values.

Why does this not give us the right value?

It’s because the query first counts the number of address_state values, returns a value of 6, and then DISTINCT is applied to the value of 6, which returns 6. Only one result is returned from the COUNT function (one result that has a value of 6), and the DISTINCT shows this one result.

So, if you’re looking to use SQL DISTINCT with COUNT, put the DISTINCT inside the COUNT function and before the column name.

Find out more here: How to Count Distinct Values in SQL

SQL DISTINCT with Join

Can you use a DISTINCT with a JOIN?

Yes, you can. It works just like a regular query, and it will only show you unique results.

For example, this could be the query without the DISTINCT keyword:

1SELECT
2c.id,
3o.order_date
4FROM customer c
5INNER JOIN customer_order o ON c.id = o.customer_id;

Result:

id	order_date
1	1 Jan 2018
2	5 Jan 2018
3	3 Jan 2018
3	14 Apr 2018
4	5 Jan 2018
5	1 Jan 2018
5	1 Jan 2018
6	21 Mar 2018

We can see there are 8 results here.

If we add the DISTINCT keyword:

1SELECT DISTINCT
2c.id,
3o.order_date
4FROM customer c
5INNER JOIN customer_order o ON c.id = o.customer_id;

Result:

id	order_date
1	1 Jan 2018
2	5 Jan 2018
3	3 Jan 2018
3	14 Apr 2018
4	5 Jan 2018
5	1 Jan 2018
6	21 Mar 2018

We can see there are 7 results here. One of the duplicate rows was removed.

Select DISTINCT on One Column with Multiple Columns Returned

Do you want to use the DISTINCT keyword on one column and show multiple columns?

Perhaps you have multiple rows for an ID and want to show only one row for that ID, but want to show multiple columns?

Well, you can do that in SQL, but you can’t do it with DISTINCT.

DISTINCT, as we’ve seen, will show unique rows only. It doesn’t know how to find the distinct values for a column.

But we can use a subquery with a function to get this.

Let’s say we have this data:

id	value1	value2
1	a	b
1	a	d
1	c	t
2	w	r
2	d	c
3	c	e

We want to show a single row for each ID.

If we use DISTINCT, we’ll get the same data, because each row is unique.

How can we show a single row?

First, we need to work out what row we want to show for each ID. Is it the first one that appears? If so, how do we work that out?

In this example, let’s say we want to find the first row for each ID when ordered by value1 then value2.

Our query would look like this:

 1SELECT id,
 2value1,
 3value2
 4FROM (
 5  SELECT id,
 6  value1,
 7  value2,
 8  ROW_NUMBER() OVER(PARTITION BY id ORDER BY value1, value2) rn
 9FROM yourtable) t
10WHERE rn = 1;

There are a few things to notice in this query.

First, we’re selecting all of the columns we want to see.

We’re selecting these from a subquery. The subquery starts by selecting those three columns as well.

Then it includes a ROW_NUMBER function. This function will assign a number value to each row returned. However it includes keywords like OVER and PARTITION BY. This means it has been used as an window function.

This will assign a number value for each row with the same id, starting at 1, ordered by value1 and value2. The column is then labelled “rn” for use in the outer query.

So the subquery would show data like this:

id	value1	value2	ROW_NUMBER
1	a	b	1
1	a	d	2
1	c	t	3
2	d	c	1
2	w	r	2
3	c	e	1

After the subquery, it’s filtered to only show records that have an RN value of 1.

The final result set would look like this:

id	value1	value2	ROW_NUMBER
1	a	b	1
2	d	c	1
3	c	e	1

So while you can’t use DISTINCT to get this result, you can use ROW_NUMBER as analytic function to select distinct values for one column with multiple columns returned.

What’s the Difference Between SQL DISTINCT and UNIQUE?

There’s a keyword in Oracle called UNIQUE. This UNIQUE keyword will also return distinct or unique rows from a result.

So what’s the difference between the keywords DISTINCT and UNIQUE?

Nothing. They are both the same. They perform the exact same operation on the data and are synonymous with each other.

Some articles may claim that one keyword performs a sort and the other does not, but this is not true. It may appear to sort based on your data or indexes, but internally, both keywords are the same.

Tom Kyte has clarified this here.

Conclusion

The SQL DISTINCT keyword is useful for eliminating duplicate results from your result set. It’s a simple keyword but there are a few things to know about it.

Do you have any questions about using DISTINCT? Leave them in the comments section below.