The SQL DISTINCT keyword lets you find unique rows.
By the end of this article, you’ll know how to use it, what it can do, and understand several different examples.
What is SQL DISTINCT?
DISTINCT is a keyword in SQL that allows you to show unique or distinct results.
It’s added to a SELECT query to eliminate duplicates in the data it displays because columns often contain duplicate values and sometimes you may only want to show unique or distinct values.
If you want to actually remove duplicate records, read this article on removing duplicates with SQL.
Syntax
The syntax of the DISTINCT keyword is as follows:
SELECT DISTINCT columns
FROM table…
You can simply add the word DISTINCT after your SELECT keyword.
You can also add the DISTINCT keyword inside aggregate functions:
SELECT aggregate_function(DISTINCT column)
FROM table…
We’ll see some examples of this below.
While you’re here, if you want an easy-to-use list of the main features in SQL for different vendors, get my SQL Cheat Sheets here:
Invalid DISTINCT Syntax
One way I’ve seen DISTINCT being used is in the middle of a SELECT statement.
For example:
SELECT col1, DISTINCT col2, col3
FROM table…
Perhaps the user is trying to show unique values of a particular column.
Because the DISTINCT keyword applies to the entire result set, and not just one column, the DISTINCT must go at the start of the query, just after SELECT:
SELECT DISTINCT col1, col2, col3
FROM table…
I’ll show an example of this later in the guide.
Examples of SQL DISTINCT
Let’s take a look at some examples of the DISTINCT keyword in SQL.
These all use sample data that looks like this:
CREATE TABLE customer (
id NUMBER(10),
first_name VARCHAR2(100),
last_name VARCHAR2(100),
address_state VARCHAR2(2),
active_status VARCHAR2(1)
);
id | first_name | last_name | address_state | active |
1 | Stephen | Jones | AL | A |
2 | Mark | Smith | CA | A |
3 | Denise | King | MO | A |
4 | Paula | Johnson | NY | D |
5 | Richard | Archer | OR | A |
6 | Christina | Jones | NY | D |
SQL DISTINCT on One Column
The simplest example of using the DISTINCT keyword is on a single column.
SELECT DISTINCT last_name
FROM customer;
Result:
last_name |
Jones |
Smith |
King |
Johnson |
Archer |
As you can see, there are 5 results here. However, there were 6 records in the table. Where did the other record go?
The DISTINCT keyword found that there are two last_name values that were the same: “Jones”. DISTINCT will not show duplicate results, so the second value was not shown.
SQL DISTINCT on One Column
Let’s look at another example of DISTINCT on a single column.
SELECT DISTINCT active
FROM customer;
active |
A |
D |
This shows only two records, because there are only two different “active” values in the table.
Do you want to know how many values exist for each different ACTIVE value? If so, you’ll need to use a COUNT and GROUP BY:
SELECT active, COUNT(*)
FROM customer
GROUP BY active;
SQL DISTINCT on Multiple Columns
You can use DISTINCT on multiple columns. The DISTINCT keyword applies to the entire result set, so adding DISTINCT to your query with multiple columns will find unique results.
SELECT DISTINCT last_name, active
FROM customer;
last_name | active |
Jones | A |
Smith | A |
King | A |
Johnson | D |
Archer | A |
Jones | D |
This query also shows 6 results because there are 6 unique records.
The query will find the last_name and active values for all records. It will then eliminate duplicate records using DISTINCT.
The reason Jones is shown twice is because even though the last names are the same, the record is not unique. It isn’t unique because the active values are different. Jones “A” is different to Jones “D”.
SQL DISTINCT on Multiple Columns
Let’s see another example: the unique Address_State and Active values.
SELECT DISTINCT address_state, active
FROM customer;
address_state | active |
AL | A |
CA | A |
MO | A |
NY | D |
OR | A |
This result only shows 5 records, because there were two results with an address_state of “NY” and an active value of “A”.
SQL DISTINCT with COUNT
You can use the DISTINCT keyword with the COUNT function.
Why would you do this?
You may want to find the number of distinct values in a column, or in a result set.
This could be different from the total number of values.
Let’s take a look at an example. First, we’ll count the number of values in the address_state column.
SELECT COUNT(address_state)
FROM customer;
Result:
COUNT(address_state) |
6 |
This shows the value of 6, because it actually counts the number of non-NULL values in the address_state column in the table.
What if we want to find the number of distinct value?
SELECT COUNT(DISTINCT address_state)
FROM customer;
Result:
COUNT(DISTINCTaddress_state) |
5 |
This shows a value of 5, because there are five unique address_state values in the table. This is the right way to use COUNT and DISTINCT.
What if we put the DISTINCT outside the COUNT function? I’ve done this before and wondered why my query doesn’t work!
SELECT DISTINCT COUNT(address_state)
FROM customer;
Result:
COUNT(address_state) |
6 |
This shows 6, which is not the number of distinct address_state values.
Why does this not give us the right value?
It’s because the query first counts the number of address_state values, returns a value of 6, and then DISTINCT is applied to the value of 6, which returns 6. Only one result is returned from the COUNT function (one result that has a value of 6), and the DISTINCT shows this one result.
So, if you’re looking to use SQL DISTINCT with COUNT, put the DISTINCT inside the COUNT function and before the column name.
Find out more here: How to Count Distinct Values in SQL
SQL DISTINCT with Join
Can you use a DISTINCT with a JOIN?
Yes, you can. It works just like a regular query, and it will only show you unique results.
For example, this could be the query without the DISTINCT keyword:
SELECT
c.id,
o.order_date
FROM customer c
INNER JOIN customer_order o ON c.id = o.customer_id;
Result:
id | order_date |
1 | 1 Jan 2018 |
2 | 5 Jan 2018 |
3 | 3 Jan 2018 |
3 | 14 Apr 2018 |
4 | 5 Jan 2018 |
5 | 1 Jan 2018 |
5 | 1 Jan 2018 |
6 | 21 Mar 2018 |
We can see there are 8 results here.
If we add the DISTINCT keyword:
SELECT DISTINCT
c.id,
o.order_date
FROM customer c
INNER JOIN customer_order o ON c.id = o.customer_id;
Result:
id | order_date |
1 | 1 Jan 2018 |
2 | 5 Jan 2018 |
3 | 3 Jan 2018 |
3 | 14 Apr 2018 |
4 | 5 Jan 2018 |
5 | 1 Jan 2018 |
6 | 21 Mar 2018 |
We can see there are 7 results here. One of the duplicate rows was removed.
Select DISTINCT on One Column with Multiple Columns Returned
Do you want to use the DISTINCT keyword on one column and show multiple columns?
Perhaps you have multiple rows for an ID and want to show only one row for that ID, but want to show multiple columns?
Well, you can do that in SQL, but you can’t do it with DISTINCT.
DISTINCT, as we’ve seen, will show unique rows only. It doesn’t know how to find the distinct values for a column.
But we can use a subquery with a function to get this.
Let’s say we have this data:
id | value1 | value2 |
1 | a | b |
1 | a | d |
1 | c | t |
2 | w | r |
2 | d | c |
3 | c | e |
We want to show a single row for each ID.
If we use DISTINCT, we’ll get the same data, because each row is unique.
How can we show a single row?
First, we need to work out what row we want to show for each ID. Is it the first one that appears? If so, how do we work that out?
In this example, let’s say we want to find the first row for each ID when ordered by value1 then value2.
Our query would look like this:
SELECT id,
value1,
value2
FROM (
SELECT id,
value1,
value2,
ROW_NUMBER() OVER(PARTITION BY id ORDER BY value1, value2) rn
FROM yourtable) t
WHERE rn = 1;
There are a few things to notice in this query.
First, we’re selecting all of the columns we want to see.
We’re selecting these from a subquery. The subquery starts by selecting those three columns as well.
Then it includes a ROW_NUMBER function. This function will assign a number value to each row returned. However it includes keywords like OVER and PARTITION BY. This means it has been used as an window function.
This will assign a number value for each row with the same id, starting at 1, ordered by value1 and value2. The column is then labelled “rn” for use in the outer query.
So the subquery would show data like this:
id | value1 | value2 | ROW_NUMBER |
1 | a | b | 1 |
1 | a | d | 2 |
1 | c | t | 3 |
2 | d | c | 1 |
2 | w | r | 2 |
3 | c | e | 1 |
After the subquery, it’s filtered to only show records that have an RN value of 1.
The final result set would look like this:
id | value1 | value2 | ROW_NUMBER |
1 | a | b | 1 |
2 | d | c | 1 |
3 | c | e | 1 |
So while you can’t use DISTINCT to get this result, you can use ROW_NUMBER as analytic function to select distinct values for one column with multiple columns returned.
What’s the Difference Between SQL DISTINCT and UNIQUE?
There’s a keyword in Oracle called UNIQUE. This UNIQUE keyword will also return distinct or unique rows from a result.
So what’s the difference between the keywords DISTINCT and UNIQUE?
Nothing. They are both the same. They perform the exact same operation on the data and are synonymous with each other.
Some articles may claim that one keyword performs a sort and the other does not, but this is not true. It may appear to sort based on your data or indexes, but internally, both keywords are the same.
Tom Kyte has clarified this here.
Conclusion
The SQL DISTINCT keyword is useful for eliminating duplicate results from your result set. It’s a simple keyword but there are a few things to know about it.
Do you have any questions about using DISTINCT? Leave them in the comments section below.
While you’re here, if you want an easy-to-use list of the main features in SQL for different vendors, get my SQL Cheat Sheets here:
What if I count distinct row in a table and I want it to return a zero if no values exist.
ex:
select ProductName,
Product_Category,
count (distinct ProductSalesTotal)
from ProductList
inner join ProductSales
where CompanyId = 123
and date > ‘2019-08-01’
–If there were no sales for that item, therefore no entries in the ProductSales Table, the item will not show up in the results and I would like it to show zero.
If you want to show 0 if there are no ProductSales, you could try a LEFT JOIN instead of an INNER JOIN. This may give you the result you need. Does that work?
What if you had a table with months 1-12 and products, for ease lets say there are 5 products. If you wanted to get every combination, then add other data later, how would you do that? So it should look something like this after your done
Month | Product
1 | 1
1 | 2
1 | 3
1 | 4
1 | 5
2 | 1
2 | 2
2 | 3
and so on
I think you can use a FULL OUTER JOIN for this, which creates a “cartesian product”. This is helpful to get every combination. From memory, the syntax is something like: SELECT columns FROM month_table FULL JOIN product_table;