Have you ever seen the PARTITION BY keyword used in SQL queries? If so, do you know what it does and how to use it? Learn about the SQL PARTITION BY keyword and see some examples in this article.
What is the SQL PARTITION BY Keyword?
PARTITION BY is a keyword that can be used in aggregate queries in SQL, such as SUM and COUNT.
This keyword, along with the OVER keyword, allows you to specify the range of records that are used for each group within the function.
It works a little like the GROUP BY clause but it’s a bit different.
Simple Example of PARTITION BY
Let’s take a look at an example.
Suppose we had a subject table that defined subject information for a school. They had a teacher_id that referred to the teacher of that subject.
Let’s say you wanted to find the subject IDs, teacher IDs, and the number of subjects that teacher teaches using COUNT.
SELECT subject_id, teacher_id, COUNT(*) OVER (PARTITION BY teacher_id) AS teacher_count FROM subject;
This table shows you all of the subject_id and teacher_id values. It also shows the teacher_count, which is the number of subjects for each teacher for each row.
You can see in the last few records, the teacher_count value is 2 because there are 2 records with that teacher_id. The value is repeated for that row.
The other records, which have a different value for the teacher_id, have a different count value.
It’s like you’re performing an aggregate function on a set of records, but without aggregating the entire result set.
Why Not Use a Subquery?
Sure, you can also use a subquery to get the same results. However, the subquery would be:
- Slower (probably)
- Harder to write
How would you do this in a subquery?
SELECT s.subject_id, s.teacher_id, sub.teacher_count FROM subject s INNER JOIN ( SELECT teacher_id, COUNT(*) AS teacher_count FROM subject GROUP BY teacher_id) sub ON sub.teacher_id = s.teacher_id ORDER BY s.teacher_id, s.subject_id;
You can see that the results are the same, but the query is longer. It also may perform slower because you’re looking at the table twice (once in the outer query and once in the inner query that has been joined to it) instead of just once.
Why Should I Use PARTITION BY Instead of GROUP BY?
We can try to use a normal query with a GROUP BY to get the same results.
SELECT subject_id, teacher_id, COUNT(*) FROM subject GROUP BY subject_id, teacher_id;
You can see that there is a value of 1 for each record. This is because the COUNT is counting the number of unique pairs of subject and teacher, and they are always unique.
How about we change the GROUP BY to only group by teacher?
SELECT subject_id, teacher_id, COUNT(*) FROM subject GROUP BY teacher_id;
Result (if we run this in Oracle):
ORA-00979: not a GROUP BY expression
This error appears because of how GROUP BY works. You need to specify each column in the SELECT clause that isn’t an aggregate function.
So, we can see that using the GROUP BY won’t give us the result we want. For this purpose, we can use a subquery (like we did earlier), or use the PARTITION BY keyword.
Hopefully that answers your questions on what the SQL PARTITION BY keyword is and how to use it.
Do you have any questions on the PARTITION BY keyword? Leave them in the comments section below.
Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!