FB

The SQL GROUP BY clause and SQL HAVING clause are powerful clauses, especially when analysing large amounts of data. Learn what the GROUP BY and HAVING clause is and everything you can do with it in this article.

This article applies to Oracle, SQL Server, MySQL, and PostgreSQL.

 

What Is The SQL GROUP BY Clause?

The GROUP BY clause is a clause in the SELECT statement. It allows you to create groups of values when using aggregating functions.

Aggregate functions without a GROUP BY will return a single value.

If you want to find the aggregate value for each value of X, you can GROUP BY x to find it.

It allows you to find the answers to problems such as:

  • The total number of sales per month
  • The average salary of an employee per department
  • The number of students enrolled per year

It goes towards the end of your SELECT query.

The SELECT statement could look like this:

SELECT columns
FROM table
WHERE condition
GROUP BY columns
HAVING condition
ORDER BY columns

The GROUP BY clause in the query is what we will be discussing in this article. We’ll also cover the HAVING clause as they are closely related.

Get Your SQL Cheat Sheet

Download the SQL Cheat Sheets: common commands and syntax - to save you time.
You'll get them for Oracle, SQL Server, MySQL, and PostgreSQL.
Print them or use them as an easy reference.

 

Sample Data

For our examples in this article, we’ll create a sample table with some data.

This data was generated using Oracle features, but the rest of the article focuses on SQL Server, MySQL, and PostgreSQL as well.

You can use a combination of DBMS_RANDOM and CONNECT BY to generate random data.

This table captures grades that students have received over several years. The grade is a decimal number, and we’re capturing the ID numbers of the students and subjects.

CREATE TABLE sample_group_table (
  school_year NUMBER,
  student_id NUMBER,
  subject_id NUMBER,
  student_grade NUMBER(5, 2)
);

INSERT INTO sample_group_table (school_year, student_id, subject_id, student_grade)
SELECT
TRUNC(DBMS_RANDOM.value(low => 2013, high => 2016)),
TRUNC(DBMS_RANDOM.value(low => 1, high => 12)),
TRUNC(DBMS_RANDOM.value(low => 1, high => 10)),
ROUND(DBMS_RANDOM.value(low => 1, high => 100), 2)
FROM dual
CONNECT BY level <= 1000;

Now we have our sample data, let’s take a look at some examples.

 

Examples of Aggregate Functions with SQL GROUP BY

Let’s see some examples of some aggregate functions with the GROUP BY clause.

 

Example 1 – COUNT

Let’s say we want to find out the COUNT of student grade records per year.

Our query would need to show the year, and the COUNT of each year.

SELECT school_year,
COUNT(*)
FROM sample_group_table
GROUP BY school_year;

We put the column school_year into the GROUP BY clause because we want to show the COUNT result for each instance of the school_year.

Also, when we use aggregate functions, we need to add any non-aggregate columns into the GROUP BY. Otherwise, we’ll get an error.

So, let’s run this query:

SCHOOL_YEARCOUNT(*)
2013325
2014349
2015326

It shows us that there are 3 rows here. We have a COUNT of each school_year value, and it shows different numbers in each year.

 

Example 2 – MAX

What if we wanted to find the maximum score for each subject, regardless of the student or year?

We would show the subject ID and the MAX function.

SELECT subject_id,
MAX(student_grade) AS max_score
FROM sample_group_table
GROUP BY subject_id;

Results:

SUBJECT_IDMAX_SCORE
199.62
699.61
299
599.19
499.26
898.75
398.06
799.93
998.87

We can see the highest grade value for each subject.

However, it’s not in any order.

We can order by either the subject_id or the student_grade.

Let’s order by the maximum student_grade.

SELECT subject_id,
MAX(student_grade) AS max_score
FROM sample_group_table
GROUP BY subject_id
ORDER BY MAX(student_grade) DESC;

Result:

SUBJECT_IDMAX_SCORE
799.93
199.62
699.61
499.26
599.19
299
998.87
898.75
398.06

You can see that subject ID 8 has the highest score, and subject ID 7 has the lowest score.

 

Example 3 – AVG

You can do the same kind of query with MIN or AVG or SUM.

Let’s try an example using AVG. Let’s find the average student_grade per student, regardless of subject and year. We’ll order this by the student_id.

SELECT student_id,
AVG(student_grade) AS avg_grade
FROM sample_group_table
GROUP BY student_id
ORDER BY student_id;

Result:

STUDENT_IDAVG_GRADE
150.93583333
252.85352273
353.276
447.63966292
548.47636364
651.227
753.86227723
847.49273684
953.02617021
1050.32844444
1151.5071875

You can see that AVG also works with the GROUP BY clause.

 

Example 4 – Multiple GROUP BY Columns

We can also use more than one column in the GROUP BY clause.

Let’s say we wanted to see the SUM of all students grades for all years.

Our query would look like this:

SELECT student_id,
school_year,
SUM(student_grade) AS total_grades
FROM sample_group_table
GROUP BY student_id, school_year
ORDER BY student_id, school_year;

Results:

STUDENT_IDSCHOOL_YEARTOTAL_GRADES
120131455.48
120141179.21
120151643.92
220131495.94
220141702.22
220151452.95
320131589.43
320141630.19
320151841.6
420131236.79
420141505.89
420151497.25
520131416.19
520141651.4
520151198.33
620131328.23
620141635.57
620151134.36
720131621.9
720142035.9
720151782.29
820131137.8
820141700.64
820151673.37
920131920.63
920141863.64
920151200.19
1020131666.61
1020141438.98
1020151423.97
1120131580.63
1120141673.48
1120151690.58

It shows the student ID, the school year, and the SUM of the student grade for those values.

 

What Is the SQL HAVING Clause?

The SQL HAVING clause is a clause in SQL that lets you filter data after it has been grouped with the GROUP BY clause.

If you want to filter data in your SQL query to remove rows based on certain criteria, you would use the WHERE clause.

However, you can’t use the WHERE clause with data from aggregate functions.

This is because the database processes the WHERE clause first, then the GROUP BY. It can’t “go back” and run another WHERE clause against the result of a COUNT function, for example.

To filter data after it has been grouped, you can use the HAVING clause.

The SQL HAVING clause works in a similar way to the WHERE clause.

You specify the conditions after the word HAVING, and any records that meet these criteria after the GROUP BY has been performed will be shown, and all others will not be shown.

 

Example 1 – AVG

Let’s say we have our average student grade per student. And let’s say we want to only show students where their average is above 52.

We would use the HAVING clause.

SELECT student_id,
AVG(student_grade) AS avg_grade
FROM sample_group_table
GROUP BY student_id
HAVING AVG(student_grade) >= 52
ORDER BY student_id;

Results:

STUDENT_IDAVG_GRADE
252.85352273
353.276
753.86227723
953.02617021

You can see there are a few records here, and these are the only students with an average grade of 52 or over.

Notice that we had to use the AVG function in its complete form in the HAVING clause, and we couldn’t use the column alias. This is because the HAVING clause cannot accept column aliases, because the SELECT clause is run last.

If we were to run a query that used the column alias in Oracle SQL:

SELECT student_id,
AVG(student_grade) AS avg_grade
FROM sample_group_table
GROUP BY student_id
HAVING avg_grade >= 52
ORDER BY student_id;

Result:

ORA-00904: "AVG_GRADE": invalid identifier
00904. 00000 -  "%s: invalid identifier"

This ORA-00904 error means we can’t use the alias in the HAVING clause.

 

Example 2 – SUM

We can build on another example and filter it using HAVING.

In an earlier example, we displayed the student ID, school year, and the SUM of their grades.

Let’s say we wanted to see this data, but only for students that had a total of 600 or more.

SELECT student_id,
school_year,
SUM(student_grade) AS total_grades
FROM sample_group_table
GROUP BY student_id, school_year
HAVING SUM(student_grade) >= 600
ORDER BY student_id, school_year;

Result:

STUDENT_IDSCHOOL_YEARTOTAL_GRADES
120131455.48
120141179.21
120151643.92
220131495.94
220141702.22
220151452.95
320131589.43
320141630.19
320151841.6
420131236.79
420141505.89
420151497.25
520131416.19
520141651.4
520151198.33
620131328.23
620141635.57
620151134.36
720131621.9
720142035.9
720151782.29
820131137.8
820141700.64
820151673.37
920131920.63
920141863.64
920151200.19
1020131666.61
1020141438.98
1020151423.97
1120131580.63
1120141673.48
1120151690.58

This shows how we can use the HAVING clause with a SUM function and multiple columns.

 

GROUP BY ROLLUP

The SQL GROUP BY clause has more to it than just specifying columns to group by.

There are several different grouping options you can use, and one of them is ROLLUP.

The ROLLUP SQL grouping type allows you to group by subtotals and a grand total.

This is available in Oracle, SQL Server, MySQL, and PostgreSQL.

It looks like this:

SELECT student_id,
school_year,
AVG(student_grade) AS avg_grades
FROM sample_group_table
GROUP BY ROLLUP (student_id, school_year)
ORDER BY student_id, school_year;

This will create grouping levels for:

  • Each combination of student_id and school_year
  • A subtotal for each student_id
  • An overall total

So, if the rollup was like this:

ROLLUP (A, B)

Your groups would be:

  • a, b
  • a
  • overall

If we run this query above, we get these results:

STUDENT_IDSCHOOL_YEARAVG_GRADES
1201348.516
1201449.13375
1201554.79733333
1(null)50.93583333
2201351.58413793
2201458.69724138
2201548.43166667
2(null)52.85352273
3201351.27193548
3201449.39969697
3201559.40645161
3(null)53.276
4201351.53291667
4201441.83027778
4201551.62931034
4(null)47.63966292
5201352.45148148
5201448.57058824
5201544.38259259
5(null)48.47636364
6201355.34291667
6201454.519
6201543.62923077
6(null)51.227
7201352.31935484
7201459.87941176
7201549.50805556
7(null)53.86227723
8201339.23448276
8201451.53454545
8201550.70818182
8(null)47.49273684
9201356.48911765
9201454.81294118
9201546.16115385
9(null)53.02617021
10201349.01794118
10201447.966
10201554.76807692
10(null)50.32844444
11201349.3946875
11201452.29625
11201552.830625
11(null)51.5071875
(null)(null)51.00556

You can see that there is an average for each combination of student_id and school_year.

There is also a row that has a student_id and a NULL value for the school_year. This indicates that it is the subtotal for all school_year values for that student_id.

At the bottom, there is a row where all the values are NULL, which is the overall total.

How can you tell which rows are the subtotals and which rows indicate where the raw data is NULL? Looking at this, you can’t. But we’ll address that issue later in this article.

 

GROUP BY CUBE in SQL

Another type of SQL GROUP BY you can do is GROUP BY CUBE.

It’s similar to ROLLUP in SQL but allows for more subtotals to be shown.

GROUP BY CUBE is available in Oracle, SQL Server, and PostgreSQL. GROUP BY CUBE is not available in MySQL.

If we have the same query as before, we can replace ROLLUP with CUBE.

SELECT student_id,
school_year,
AVG(student_grade) AS avg_grades
FROM sample_group_table
GROUP BY CUBE (student_id, school_year)
ORDER BY student_id, school_year;

Cube in SQL will show subtotals for all combinations of columns.

Using the columns above, it will show:

  • Each combination of student_id and school_year (same as ROLLUP)
  • A subtotal for each student_id (same as ROLLUP)
  • A subtotal for each school_year
  • An overall total (same as ROLLUP)

It results in more subtotals, and if you’re after subtotals for every combination, it can be useful.

If we run this query above, we get these results:

STUDENT_IDSCHOOL_YEARAVG_GRADES
1201348.516
1201449.13375
1201554.79733333
1(null)50.93583333
2201351.58413793
2201458.69724138
2201548.43166667
2(null)52.85352273
3201351.27193548
3201449.39969697
3201559.40645161
3(null)53.276
4201351.53291667
4201441.83027778
4201551.62931034
4(null)47.63966292
5201352.45148148
5201448.57058824
5201544.38259259
5(null)48.47636364
6201355.34291667
6201454.519
6201543.62923077
6(null)51.227
7201352.31935484
7201459.87941176
7201549.50805556
7(null)53.86227723
8201339.23448276
8201451.53454545
8201550.70818182
8(null)47.49273684
9201356.48911765
9201454.81294118
9201546.16115385
9(null)53.02617021
10201349.01794118
10201447.966
10201554.76807692
10(null)50.32844444
11201349.3946875
11201452.29625
11201552.830625
11(null)51.5071875
(null)201350.61424615
(null)201451.62498567
(null)201550.73254601
(null)(null)51.00556

You can see that the results are similar, but there are subtotals for school_year values.

If you have a GROUP BY CUBE with three columns, you’ll get even more combinations.

 

GROUP BY GROUPING SETS in SQL

Another way to group columns is to use the GROUPING SETS keyword.

If you don’t need to get subtotals for all possible combinations, but you need more than just a standard GROUP BY, you can use the GROUPING SETS keyword.

GROUPING SETS in SQL allows you to be very specific about how you want your data grouped.

You can specify columns, in sets of brackets, that make up the groups.

GROUPING SETS is available in Oracle, SQL Server, and PostgreSQL. GROUPING SETS is not available in MySQL.

For example, let’s say we wanted to see the student ID, subject ID, school year, and an AVG of the grade.

If we use GROUP BY CUBE, we will get:

  • Groups for each combination of the student ID, subject ID, and school year
  • Subtotals for each student ID and subject ID
  • Subtotals for each student ID and school year
  • Subtotals for each student ID
  • Subtotals for each subject ID and school year
  • Subtotals for each subject ID
  • Subtotals for each school year
  • An overall subtotal

As you can see, this is a lot of totals to display.

If we don’t need them all, we can use GROUP BY GROUPING SETS to specify what to show.

We could say that we only want to see subtotals of student ID and subject ID, and then subtotals of school_year.

Our query would look like this:

SELECT student_id,
subject_id,
school_year,
SUM(student_grade) AS total_grades
FROM sample_group_table
GROUP BY GROUPING SETS ((student_id, subject_id), school_year)
ORDER BY student_id, subject_id, school_year;

Results:

STUDENT_IDSUBJECT_IDSCHOOL_YEARTOTAL_GRADES
11(null)324.53
12(null)337.96
13(null)580.53
14(null)599.24
15(null)416.48
16(null)375.21
17(null)402.35
18(null)518.44
19(null)723.87
21(null)389.73
22(null)360.23
23(null)536.75
24(null)653.33
25(null)424.96
26(null)450.28
27(null)535.62
28(null)706.6
29(null)593.61
31(null)657.56
32(null)1035.52
33(null)248.39
34(null)441.32
35(null)620.95
36(null)457.43
37(null)554.82
38(null)396.05
39(null)649.18
41(null)355.36
42(null)687.64
43(null)624.78
44(null)412.19
45(null)318.31
46(null)334.57
47(null)415.91
48(null)570.98
49(null)520.19
51(null)539.81
52(null)451.71
53(null)382.12
54(null)437.65
55(null)683.88
56(null)357.87
57(null)314.93
58(null)380.62
59(null)717.33
61(null)513.61
62(null)622.74
63(null)257.64
64(null)621.51
65(null)510.28
66(null)349.56
67(null)247.56
68(null)525.26
69(null)450
71(null)833.54
72(null)566.76
73(null)582.97
74(null)738.63
75(null)944.43
76(null)409.67
77(null)301.28
78(null)599.53
79(null)463.28
81(null)1019.64
82(null)536.57
83(null)337.49
84(null)484.55
85(null)309.76
86(null)701.6
87(null)410.35
88(null)298.08
89(null)413.77
91(null)492.77
92(null)1027.5
93(null)497.84
94(null)473.89
95(null)579.51
96(null)656.75
97(null)437.81
98(null)507.82
99(null)310.57
101(null)621.87
102(null)640.61
103(null)448.57
104(null)496.4
105(null)505.89
106(null)483.01
107(null)454.69
108(null)737.64
109(null)140.88
111(null)701.42
112(null)497.23
113(null)674.54
114(null)531.65
115(null)655.58
116(null)358.36
117(null)618.88
118(null)610.54
119(null)296.49
(null)(null)201316449.63
(null)(null)201418017.12
(null)(null)201516538.81

It shows groups for student ID and subject ID, and NULL for school year – indicating this is a subtotal.

It then shows a subtotal for each school year value, at the bottom.

There is no overall total. But we can add one with ().

SELECT student_id,
subject_id,
school_year,
SUM(student_grade) AS total_grades
FROM sample_group_table
GROUP BY GROUPING SETS ((student_id, subject_id), school_year, ())
ORDER BY student_id, subject_id, school_year;

Results:

STUDENT_IDSUBJECT_IDSCHOOL_YEARTOTAL_GRADES
11(null)324.53
12(null)337.96
13(null)580.53
14(null)599.24
15(null)416.48
16(null)375.21
17(null)402.35
18(null)518.44
19(null)723.87
21(null)389.73
22(null)360.23
23(null)536.75
24(null)653.33
25(null)424.96
26(null)450.28
27(null)535.62
28(null)706.6
29(null)593.61
31(null)657.56
32(null)1035.52
33(null)248.39
34(null)441.32
35(null)620.95
36(null)457.43
37(null)554.82
38(null)396.05
39(null)649.18
41(null)355.36
42(null)687.64
43(null)624.78
44(null)412.19
45(null)318.31
46(null)334.57
47(null)415.91
48(null)570.98
49(null)520.19
51(null)539.81
52(null)451.71
53(null)382.12
54(null)437.65
55(null)683.88
56(null)357.87
57(null)314.93
58(null)380.62
59(null)717.33
61(null)513.61
62(null)622.74
63(null)257.64
64(null)621.51
65(null)510.28
66(null)349.56
67(null)247.56
68(null)525.26
69(null)450
71(null)833.54
72(null)566.76
73(null)582.97
74(null)738.63
75(null)944.43
76(null)409.67
77(null)301.28
78(null)599.53
79(null)463.28
81(null)1019.64
82(null)536.57
83(null)337.49
84(null)484.55
85(null)309.76
86(null)701.6
87(null)410.35
88(null)298.08
89(null)413.77
91(null)492.77
92(null)1027.5
93(null)497.84
94(null)473.89
95(null)579.51
96(null)656.75
97(null)437.81
98(null)507.82
99(null)310.57
101(null)621.87
102(null)640.61
103(null)448.57
104(null)496.4
105(null)505.89
106(null)483.01
107(null)454.69
108(null)737.64
109(null)140.88
111(null)701.42
112(null)497.23
113(null)674.54
114(null)531.65
115(null)655.58
116(null)358.36
117(null)618.88
118(null)610.54
119(null)296.49
(null)(null)201316449.63
(null)(null)201418017.12
(null)(null)201516538.81
(null)(null)(null)51005.56

You can see the results are the same, but there is now an overall total.

So, the GROUPING SETS option is quite flexible in what you want to display.

 

Composite Grouping

With the different ways to group data, you can be pretty flexible with what you want to display.

If you use ROLLUP, CUBE, or GROUPING SETS, you can use brackets within the grouping criteria to further define how you want to group data.

Let’s say you have a query that does a ROLLUP:

ROLLUP (a, b, c, d)

This will create groups and subtotals for:

  • a, b, c, d
  • a, b, c
  • a, b
  • a
  • Overall

If you use CUBE:

CUBE (a, b, c, d)

You will get groups and subtotals for:

  • a, b, c, d
  • a, b, c
  • a, b, d
  • a, c, d
  • a, b, d
  • a, b
  • a, c
  • a, d
  • a
  • b, c, d
  • b, c
  • b, d
  • b
  • c, d
  • c
  • d
  • Overall

Composite grouping allows you to add columns together, within brackets, inside the CUBE or ROLLUP brackets. This means they are treated as a single unit.

For example:

ROLLUP ((a, b), c, d)

This will produce groups of:

  • a, b, c, d
  • a, b, c
  • a, b
  • ()

There is no group made for “a”, because it is combined with “b” to make a single group.

You can use a similar concept with CUBE:

CUBE (a, (b, c), d)

This will produce groups of:

  • a, b, c, d
  • a, b, c
  • a, d
  • a
  • b, c, d
  • b, c
  • d
  • Overall

It results in a much smaller set of groups.

 

GROUPING Function

There are several SQL functions related to SQL GROUP BY and the grouping feature that can help you identify and work with groups.

Earlier in our examples, we noticed that subtotals had values of NULL for the columns that weren’t being considered in the group.

How can we tell which values are subtotals and which values are NULL because of the underlying data?

This is what the GROUPING function can help with.

The GROUPING function takes a single column as a parameter. The output is either 1 or 0:

  • It returns 1 if the column is being used as part of a subtotal and is showing NULL.
  • It returns 0 if the underlying value is NULL or any other value.

This is available in Oracle, SQL Server, MySQL, and PostgreSQL.

If we use an earlier example, and add the GROUPING function, we can see which rows are the subtotal rows:

SELECT student_id,
school_year,
SUM(student_grade) AS total_grade,
GROUPING(student_id) AS grp_stud,
GROUPING(school_year) AS grp_yr
FROM sample_group_table
GROUP BY ROLLUP (student_id, school_year)
ORDER BY student_id, school_year;

Results:

STUDENT_IDSCHOOL_YEARTOTAL_GRADEGRP_STUDGRP_YR
120131455.4800
120141179.2100
120151643.9200
1(null)4278.6101
220131495.9400
220141702.2200
220151452.9500
2(null)4651.1101
320131589.4300
320141630.1900
320151841.600
3(null)5061.2201
420131236.7900
420141505.8900
420151497.2500
4(null)4239.9301
520131416.1900
520141651.400
520151198.3300
5(null)4265.9201
620131328.2300
620141635.5700
620151134.3600
6(null)4098.1601
720131621.900
720142035.900
720151782.2900
7(null)5440.0901
820131137.800
820141700.6400
820151673.3700
8(null)4511.8101
920131920.6300
920141863.6400
920151200.1900
9(null)4984.4601
1020131666.6100
1020141438.9800
1020151423.9700
10(null)4529.5601
1120131580.6300
1120141673.4800
1120151690.5800
11(null)4944.6901
(null)(null)51005.5611

You can see that there are several values in the GRP_STUD column and the GRP_YR column that have a value of 1. This means that they are subtotal rows.

You can then use other programs or code to treat these rows differently if you need to. It’s much more reliable than just checking if the other columns are NULL.

You can also order by these columns, to have the subtotals at the start or the end, instead of in the middle of the results.

SELECT student_id,
school_year,
SUM(student_grade) AS total_grade,
GROUPING(student_id) AS grp_stud,
GROUPING(school_year) AS grp_yr
FROM sample_group_table
GROUP BY ROLLUP (student_id, school_year)
ORDER BY GROUPING(student_id), GROUPING(school_year), student_id, school_year;

Results:

STUDENT_IDSCHOOL_YEARTOTAL_GRADEGRP_STUDGRP_YR
120131455.4800
120141179.2100
120151643.9200
220131495.9400
220141702.2200
220151452.9500
320131589.4300
320141630.1900
320151841.600
420131236.7900
420141505.8900
420151497.2500
520131416.1900
520141651.400
520151198.3300
620131328.2300
620141635.5700
620151134.3600
720131621.900
720142035.900
720151782.2900
820131137.800
820141700.6400
820151673.3700
920131920.6300
920141863.6400
920151200.1900
1020131666.6100
1020141438.9800
1020151423.9700
1120131580.6300
1120141673.4800
1120151690.5800
1(null)4278.6101
2(null)4651.1101
3(null)5061.2201
4(null)4239.9301
5(null)4265.9201
6(null)4098.1601
7(null)5440.0901
8(null)4511.8101
9(null)4984.4601
10(null)4529.5601
11(null)4944.6901
(null)(null)51005.5611

 

GROUPING_ID Function

The GROUPING_ID function is another function you can use when working with SQL GROUP BY. It can be used to identify:

  • Whether the row is a subtotal
  • What “level” the subtotal is for

You provide the columns that you’re grouping by as parameters. The GROUPING_ID function returns a number indicating the grouping level:

  • 0 for combinations of each column
  • 1 for subtotals of column 1
  • 2 for subtotals of column 2
  • And so on…

The highest value returned is also for the overall total, which will depend on how many groups you have.

This is available in Oracle, SQL Server, and PostgreSQL. GROUPING_ID is not available in MySQL.

For example:

SELECT student_id,
school_year,
SUM(student_grade) AS total_grade,
GROUPING_ID(student_id, school_year) AS grp_id
FROM sample_group_table
GROUP BY CUBE(student_id, school_year)
ORDER BY student_id, school_year;

Results:

STUDENT_IDSCHOOL_YEARTOTAL_GRADEGRP_ID
120131455.480
120141179.210
120151643.920
1(null)4278.611
220131495.940
220141702.220
220151452.950
2(null)4651.111
320131589.430
320141630.190
320151841.60
3(null)5061.221
420131236.790
420141505.890
420151497.250
4(null)4239.931
520131416.190
520141651.40
520151198.330
5(null)4265.921
620131328.230
620141635.570
620151134.360
6(null)4098.161
720131621.90
720142035.90
720151782.290
7(null)5440.091
820131137.80
820141700.640
820151673.370
8(null)4511.811
920131920.630
920141863.640
920151200.190
9(null)4984.461
1020131666.610
1020141438.980
1020151423.970
10(null)4529.561
1120131580.630
1120141673.480
1120151690.580
11(null)4944.691
(null)201316449.632
(null)201418017.122
(null)201516538.812
(null)(null)51005.563

You can see the grp_id column shows different values depending on if the row is a subtotal or not.

 

GROUP_ID Function

The GROUP_ID function is used to assign a number to each group that results from a SELECT query clause with a GROUP BY clause.

It’s used to identify any duplicate groups in your output, and works with queries with aggregate functions.

For each unique group that is returned, GROUP_ID will return 0. If a duplicate is found, a value >0 will be returned.

This GROUP_ID function is only available in Oracle. It is not available in SQL Server, MySQL, or PostgreSQL.

For example:

SELECT student_id,
school_year,
SUM(student_grade) AS total_grade,
GROUP_ID()
FROM sample_group_table
GROUP BY student_id, CUBE(student_id, school_year)
ORDER BY student_id, school_year;

Results:

STUDENT_IDSCHOOL_YEARTOTAL_GRADEGROUP_ID()
120131455.481
120131455.480
120141179.210
120141179.211
120151643.920
120151643.921
1(null)4278.611
1(null)4278.610
220131495.940
220131495.941
220141702.221
220141702.220
220151452.950
220151452.951
2(null)4651.111
2(null)4651.110
320131589.431
320131589.430
320141630.191
320141630.190
320151841.61
320151841.60
3(null)5061.220
3(null)5061.221
420131236.790
420131236.791
420141505.890
420141505.891
420151497.250
420151497.251
4(null)4239.930
4(null)4239.931
520131416.190
520131416.191
520141651.40
520141651.41
520151198.330
520151198.331
5(null)4265.921
5(null)4265.920
620131328.230
620131328.231
620141635.570
620141635.571
620151134.361
620151134.360
6(null)4098.161
6(null)4098.160
720131621.91
720131621.90
720142035.90
720142035.91
720151782.290
720151782.291
7(null)5440.091
7(null)5440.090
820131137.80
820131137.81
820141700.640
820141700.641
820151673.371
820151673.370
8(null)4511.811
8(null)4511.810
920131920.631
920131920.630
920141863.640
920141863.641
920151200.190
920151200.191
9(null)4984.461
9(null)4984.460
1020131666.611
1020131666.610
1020141438.981
1020141438.980
1020151423.970
1020151423.971
10(null)4529.560
10(null)4529.561
1120131580.631
1120131580.630
1120141673.481
1120141673.480
1120151690.581
1120151690.580
11(null)4944.691
11(null)4944.690

You can see that some records here have a GROUP_ID of 1.

 

Concatenated Grouping

It’s possible to combine different grouping types into the one SQL GROUP BY clause.

You can specify a combination of GROUP BY, GROUPING SETS, CUBEs or ROLLUPs in a single SQL GROUP BY clause.

The groups are then concatenated and grouping sets are then made.

For example:

SELECT student_id,
school_year,
SUM(student_grade) AS total_grade
FROM sample_group_table
GROUP BY ROLLUP (student_id, school_year)
ORDER BY student_id, school_year;

This query groups by a rollup of student_id and school_year.

Now, what if we add a regular group by to that as well?

SELECT student_id,
school_year,
SUM(student_grade) AS total_grade,
FROM sample_group_table
GROUP BY student_id, ROLLUP (student_id, school_year)
ORDER BY student_id, school_year;

Results:

STUDENT_IDSCHOOL_YEARTOTAL_GRADE
120131455.48
120141179.21
120151643.92
1(null)4278.61
1(null)4278.61
220131495.94
220141702.22
220151452.95
2(null)4651.11
2(null)4651.11
320131589.43
320141630.19
320151841.6
3(null)5061.22
3(null)5061.22
420131236.79
420141505.89
420151497.25
4(null)4239.93
4(null)4239.93
520131416.19
520141651.4
520151198.33
5(null)4265.92
5(null)4265.92
620131328.23
620141635.57
620151134.36
6(null)4098.16
6(null)4098.16
720131621.9
720142035.9
720151782.29
7(null)5440.09
7(null)5440.09
820131137.8
820141700.64
820151673.37
8(null)4511.81
8(null)4511.81
920131920.63
920141863.64
920151200.19
9(null)4984.46
9(null)4984.46
1020131666.61
1020141438.98
1020151423.97
10(null)4529.56
10(null)4529.56
1120131580.63
1120141673.48
1120151690.58
11(null)4944.69
11(null)4944.69

This results in extra grouping sets.

 

Summary of SQL GROUP BY Differences Between Vendors

This table shows the differences in the SQL GROUP BY clause between each database vendor.

CriteriaOracleSQL ServerMySQLPostgreSQL
RollupYesYesYesYes
CubeYesYesNoYes
Grouping SetsYesYesNoYes
GROUPING FunctionYesYesYes, in 8.0.1Yes
GROUPING_ID FunctionYesYesNoNo
GROUP_ID FunctionYesNoNoNo

 

Summary

So, in summary:

  • Grouping is needed when working with aggregate functions.
  • The SQL GROUP BY clause allows you to specify the columns to group by.
  • The HAVING clause allows you to filter records after the GROUP BY is applied.
  • ROLLUP lets you group by sets of columns, from right to left.
  • CUBE lets you group by all combinations of columns.
  • GROUPING SETS lets you specify the sets or subtotals you require.
  • Composite grouping means you can use brackets to force several columns to be treated as a single unit for grouping.
  • Concatenated grouping means you can specify different group types within the same GROUP BY clause, separated by a comma.
  • The GROUPING function shows 1 if a row is a subtotal row and 0 if it is not.
  • The GROUPING_ID function shows a number indicating what level of a group the row relates to.
  • The GROUP_ID function shows 0 if the record is unique or 1 if it is a duplicate.

That’s how the GROUP BY clause and related keywords work.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!

Get Your SQL Cheat Sheet

Download the SQL Cheat Sheets: common commands and syntax - to save you time.
You'll get them for Oracle, SQL Server, MySQL, and PostgreSQL.
Print them or use them as an easy reference.

Get Your SQL Cheat Sheets Now:

Want Some SQL Cheat Sheets?

sql-cheat-sheet-images-v2

Download the SQL Cheat Sheets: common commands and syntax - to save you time.

You'll get them for Oracle, SQL Server, MySQL, and PostgreSQL.

Print them or use them as an easy reference.