SQL Aggregate Functions: A Guide

Jan 12, 2021 · 12 min read · sql-functions ·

SQL aggregate functions are very useful and are some of the most commonly used functions. Learn all about SQL aggregate functions and what they do in this article.

What Are SQL Aggregate Functions?

Aggregate functions are functions that allow you to view a single piece of data from multiple pieces of data.

Most functions in SQL operate on a single row or a single record, such as DECODE or LENGTH.

However, aggregate functions are different because you can use them on multiple rows and get a single value as a result.

An example of an aggregate function is SUM. This function allows you to input a range of values, such as those in a column, and get the sum total of those numbers.

If you had a table that looked like this:

1SELECT * FROM student;

STUDENT_ID	FIRST_NAME	LAST_NAME	FEES_REQUIRED	FEES_PAID	ADDRESS_STATE
1	John	Smith	500	100	New York
2	Susan	Johnson	150	150	Colorado
3	Tom	Capper	350	320	Nevada
4	Mark	Holloway	500	410	New York
5	Steven	Webber	100	80	New York
6	Julie	Armstrong	100	0	Texas
7	Michelle	Randall	250	(null)	Florida
8	Andrew	Cooper	800	400	Texas
9	Robert	Pickering	110	100	Colorado
10	Tanya	Hall	150	150	Texas

You could then use an aggregate function, such as SUM, to get the total of all values in the fees_paid column.

1SELECT SUM(fees_paid)
2FROM student;

SUM(FEES_PAID)
1710

This function collapses the many rows from the table into a single row that represents the SUM aggregate function.

There are many more aggregate functions, which we'll look at in this article.

You can also do some interesting things with aggregate functions, such as selecting all rows that meet a maximum value or selecting the maximum value based on a subgroup.

What Is The GROUP BY Clause?

SQL has something called the "group by" clause. What does it mean? What does it do?

The GROUP BY clause allows you to use aggregate functions on parts of your data.

The example above showed the SUM of all of the fees_paid values in the table.

But what if you wanted to see the SUM of fees_paid for each address_state?

1SELECT address_state,
2SUM(fees_paid)
3FROM student;

The result depends on which database you run this on:

Oracle:

ORA-00937: not a single-group group function

The ORA-00937 occurs whenever you use an aggregate function and don't include all of the non-aggregate functions from the SELECT clause in a GROUP BY clause;

SQL Server:

Column 'student.address_state' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

MySQL:

MySQL will actually show you a result. However, it shows the first address_state and the sum of all fees_paid records. This seems like it's correct but it is not.

PostgreSQL:

ERROR: column "student.address_state" must appear in the GROUP BY clause or be used in an aggregate function Position: 8

You'll get these errors because we haven't grouped our data.

This is where the GROUP BY clause comes in.

The GROUP BY clause allows you to group your aggregate function by other values.

You'll need to add GROUP BY, then the names of all of the other fields in the SELECT clause.

1SELECT address_state,
2SUM(fees_paid)
3FROM student
4GROUP BY address_state;

Result:

ADDRESS_STATE	SUM(FEES_PAID)
Texas	550
New York	590
Colorado	250
Nevada	320
Florida	(null)

This tells the database that you want to see the SUM of fees paid, but group by all of the address_state values.

What Is The HAVING Clause?

The HAVING clause is another clause that is relevant for aggregate functions.

It allows you to limit the results that are returned based on data after it has been grouped.

What if you wanted to see the SUM of the fees_paid values for each state, but only where the sum was over 500?

You could try use a WHERE clause and say WHERE fees_paid > 500.

1SELECT address_state,
2SUM(fees_paid)
3FROM student
4WHERE fees_paid > 500
5GROUP BY address_state;

Result:

ADDRESS_STATE	SUM(FEES_PAID)

However, in this case, you'll get no results.

Why is this?

This is because the WHERE clause is run first, which eliminates rows that don't meet the criteria. In this case, all rows where fees_paid <= 500 are removed from the result set.

Then, the SUM and GROUP BY are performed. However, there are no rows in the result set to do this to.

How can we get the result we want?

Use the HAVING clause.

The HAVING clause is similar to the WHERE clause, but it works on data after it has been grouped.

This is an important point to remember.

The WHERE clause works on data before it has been grouped, and the HAVING clause works on data after it has been grouped.

The syntax is similar to a WHERE clause.

1SELECT address_state,
2SUM(fees_paid)
3FROM student
4GROUP BY address_state
5HAVING fees_paid > 500;

Here's what you'll see.

Oracle:

ORA-00979: not a GROUP BY expression

SQL Server:

Column 'student.fees_paid' is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.

MySQL:

Unknown column 'fees_paid' in 'having clause'

PostgreSQL:

ERROR: column "student.fees_paid" must appear in the GROUP BY clause or be used in an aggregate function Position: 81

I'm getting this error because I've specified the fees_paid in the HAVING clause, and I need to specify the aggregate function as well.

So, instead of saying HAVING fees_paid, I need to say HAVING SUM(fees_paid), because that's the value we are checking.

1SELECT address_state,
2SUM(fees_paid)
3FROM student
4GROUP BY address_state
5HAVING SUM(fees_paid) > 500;

Result:

ADDRESS_STATE	SUM(FEES_PAID)
Texas	550
New York	590

This now shows the result that we want.

DISTINCT, UNIQUE, and ALL with Aggregate Functions

SQL allows you to use several optional keywords within aggregate functions, which are:

DISTINCT
UNIQUE
ALL

DISTINCT and UNIQUE are synonymous with each other so I'll refer to it as DISTINCT in this article.

They are used in this way:

1SUM ( [ALL | DISTINCT ] expression)

The ALL keyword is the default and means that the aggregate function will consider all values when performing its calculation.

The DISTINCT keyword can be specified, and it means the aggregate function will only consider distinct or unique values of the expression in the calculation. This feature is offered by Oracle, SQL Server, MySQL, and PostgreSQL (and possibly others).

Let's see an example.

1SELECT SUM(fees_paid)
2FROM student;

Result:

SUM(FEES_PAID)
1710

This shows the normal SUM function without DISTINCT or ALL.

1SELECT SUM(ALL fees_paid)
2FROM student;

Result:

SUM(ALLFEES_PAID)
1710

If we add the ALL function, it shows the same as the first example. This is because ALL is the default.

Let's add the DISTINCT keyword:

1SELECT SUM(DISTINCT fees_paid)
2FROM student;

SUM(DISTINCTFEES_PAID)
1460

If we add the DISTINCT keyword, the SUM is smaller, because only distinct values are considered.

How Does the Database Handle NULL Values in Aggregate Functions?

In SQL, almost all of the aggregate functions ignore NULL values.

The functions that consider NULL values are:

COUNT(*)
GROUPING
GROUPING_ID

If you want to include the records that have NULL values, you can convert them to another value using the NVL function.

What happens if you have NULLs in your data and you use an aggregate function on them?

If you use COUNT you'll get either a number for the count of records or 0.

If you use any other function, and you have only NULL values in the data set, the function will return NULL.

Can You Use Nested Aggregate Functions?

Yes, you can.

Let's say you have the SUM of fees_paid for students in each state, as we mentioned earlier.

1SELECT address_state,
2SUM(fees_paid)
3FROM student
4GROUP BY address_state;

Result:

ADDRESS_STATE	SUM(FEES_PAID)
Texas	550
New York	590
Colorado	250
Nevada	320
Florida	(null)

What if you wanted to find the average of the SUM of fees paid in each state?

You can wrap the SUM function in another aggregate function, such as AVG.

1SELECT
2AVG(SUM(fees_paid))
3FROM student
4GROUP BY address_state;

Result:

AVG(SUM(FEES_PAID)
427.5

The calculation here will find the SUM of fees_paid for each address_state, and then get the AVG of those values.

However, as you might have noticed, you need to remove the field from the SELECT clause. This is because you're trying to get the overall average, not the average per state.

List of SQL Aggregate Functions

Here's a list of SQL aggregate functions in Oracle, MySQL, SQL Server, and PostgreSQL, with a short description of what they do.

I have written detailed guides on many functions, and where a guide exists, I've linked the function name to it in this table. So, if you click on the function name, you'll find more information on how to use it and see some examples.

Function	Database	Definition
APPROX_COUNT_DISTINCT	Oracle, SQL Server	Finds the approximate number of distinct rows in the expression.
ARRAY_AGG	PostgreSQL	Collects all input values into an array
AVG	Oracle, SQL Server, MySQL, PostgreSQL	Finds the average of the expression.
BIT_AND	MySQL, PostgreSQL	Returns the result of a bitwise AND operation
BIT_OR	MySQL, PostgreSQL	Returns the result of a bitwise OR operation
BIT_XOR	MySQL	Returns the result of a bitwise XOR operation
BOOL_AND	PostgreSQL	Returns true if all non-null input values are true, otherwise it returns false.
BOOL_OR	PostgreSQL	Returns true if any non-null input value is true, otherwise it returns false.
CHECKSUM_AGG	SQL Server	Finds the checksum of values in a group.
COLLECT	Oracle	Creates a nested table from the provided column.
CORR	Oracle, PostgreSQL	Finds the coefficient of correlation of a set of number pairs.
CORR_K	Oracle	Calculates the Kendall's tau-b correlation coefficient.
CORR_S	Oracle	Calculates the Spearman's rho correlation coefficient.
COUNT	Oracle, SQL Server, MySQL, PostgreSQL	Finds the number of rows returned by a query.
COUNT_BIG	SQL Server	Counts the number of values found. Similar to COUNT but returns a BIGINT.
COVAR_POP	Oracle, PostgreSQL	Finds the population covariance of a set of number pairs.
COVAR_SAMP	Oracle, PostgreSQL	Finds the sample covariance of a set of number pairs.
CUME_DIST	Oracle, SQL Server, MySQL, PostgreSQL	Finds the cumulative distribution of a value in a group of values.
DENSE_RANK	Oracle, MySQL, PostgreSQL	Finds the rank of a row in an ordered group of rows, and rows with the same value get the same ranks but rank numbers are not skipped. This can result in a ranking of 1, 2, 3, 3, 4 (rank 4 is still used, even if there is a tie for rank 3).
EVERY	PostgreSQL	Same as BOOL_AND
FIRST (Oracle) FIRST_VALUE (SQL Server, MySQL)	Oracle, SQL Server, MySQL	Used with ranking functions to get the first value.
GROUP_CONCAT	MySQL	Concatenates a string of values.
GROUP_ID	Oracle	Identifies an ID of a grouping level in GROUP BY clauses.
GROUPING	Oracle, SQL Server	Identifies superaggregate rows from regular grouped rows.
GROUPING_ID	Oracle, SQL Server	Identifies the field that has been grouped by.
JSON_ARRAYAGG (MySQL) JSON_AGG (PostgreSQL)	MySQL, PostgreSQL	Aggregates a result set as a single JSON array.
JSON_OBJECTAGG (MySQL) JSON_OBJECT_AGG (PostgreSQL)	MySQL	Aggregates a result set as a set of key-value pairs.
LAST (Oracle) LAST_VALUE (SQL Server, MySQL)	Oracle, SQL Server, MySQL	Used with ranking functions to get the last value.
LISTAGG	Oracle	Orders data within a set of data and then concatenates it with a specified character.
MAX	Oracle, SQL Server, MySQL, PostgreSQL	Finds the maximum value of the expression.
MEDIAN	Oracle	Finds the middle value of a set of data.
MODE	PostgreSQL	Finds the most frequently-used value in a set of data.
MIN	Oracle, SQL Server, MySQL, PostgreSQL	Finds the minimum value of the expression.
PERCENT_RANK	Oracle, SQL Server, MySQL, PostgreSQL	Finds the percentage rank of a row, similar to the CUME_DIST function.
PERCENTILE_CONT	Oracle, SQL Server, PostgreSQL	Finds a value that would fall within the specified range using a specified percentage value and sort method.
PERCENTILE_DISC	Oracle, SQL Server, PostgreSQL	Finds a value that would fall within the specified range using a specified percentage value and sort method.
RANK	Oracle, MySQL, PostgreSQL	Calculates a rank of a value in a group of values, and rows with the same value get the same rank with rank numbers getting skipped. This can result in a ranking of 1, 2, 3, 3, 5 (rank 4 is skipped because there is a tie).
REGR_ Functions	Oracle, PostgreSQL	Fits an ordinary-least-squares regression line to a set of number pairs.
STATS_BINOMIAL_TEST	Oracle	Tests the difference between a sample proportion and a given proportion.
STATS_CROSSTAB	Oracle	Uses a method to analyse two variables using crosstab.
STATS_F_TEST	Oracle	Checks whether two values are significantly different.
STATS_KS_TEST	Oracle	Checks whether two samples are from the same population, or different populations with the same distribution.
STATS_MODE	Oracle	Finds the value that occurs the most.
STATS_MW_TEST	Oracle	Performs a Mann Whitney test on two populations.
STATS_ONE_WAY_ANOVA	Oracle	Tests the difference in means for groups for statistical significance.
STATS_T_TEST_*	Oracle	Checks the significance of a difference of means.
STATS_WSR_TEST	Oracle	Performs a Wilcoxon Signed Ranks test to see if the median of the differences is significantly different from 0.
STDDEV (Oracle, MySQL, PostgreSQL) STDEV (SQL Server)	Oracle, SQL Server, MySQL, PostgreSQL	Finds the standard deviation of the set of values.
STDDEV_POP (Oracle, MySQL, PostgreSQL) STDEVP (SQL Server) STD (MySQL)	Oracle, SQL Server, MySQL, PostgreSQL	Finds the population standard deviation of a set of values.
STDDEV_SAMP	Oracle, MySQL, PostgreSQL	Finds the cumulative sample standard deviation of a set of values.
STRING_AGG	SQL Server, PostgreSQL	Concatenates a set of values with a specified separator
SUM	Oracle, SQL Server, MySQL, PostgreSQL	Finds the sum of values.
SYS_OP_ZONE_ID	Oracle	Finds a Zone ID from a specified row ID
SYS_XMLAGG	Oracle	Aggregates XML documents and produces a single XML document.
VAR_POP (Oracle, MySQL, PostgreSQL) VARP (SQL Server)	Oracle, SQL Server, MySQL, PostgreSQL	Finds the population variance of a set of numbers.
VAR_SAMP	Oracle, MySQL, PostgreSQL	Finds the sample variance of a set of numbers.
VARIANCE (Oracle, MySQL, PostgreSQL) VAR (SQL Server)	Oracle, SQL Server, MySQL, PostgreSQL	Finds the variance of a set of numbers.
XMLAGG	Oracle, PostgreSQL	Aggregates XML documents and produces a single XML document.

So, that's how you use aggregate functions in SQL. If you have any questions, please leave a comment at the end of this post.