FB

SQL Aggregate Functions: A Guide

SQL aggregate functions are very useful and are some of the most commonly used functions. Learn all about SQL aggregate functions are and what they do in this article.

 

What Are SQL Aggregate Functions?

Aggregate functions are functions that allow you to view a single piece of data from multiple pieces of data.

Most functions in SQL operate on a single row or a single record, such as DECODE or LENGTH.

However, aggregate functions are different because you can use them on multiple rows and get a single value as a result.

An example of an aggregate function is SUM. This function allows you to input a range of values, such as those in a column, and get the sum total of those numbers.

If you had a table that looked like this:

SELECT * FROM student;
STUDENT_ID FIRST_NAME LAST_NAME FEES_REQUIRED FEES_PAID ADDRESS_STATE
1 John Smith 500 100 New York
2 Susan Johnson 150 150 Colorado
3 Tom Capper 350 320 Nevada
4 Mark Holloway 500 410 New York
5 Steven Webber 100 80 New York
6 Julie Armstrong 100 0 Texas
7 Michelle Randall 250 (null) Florida
8 Andrew Cooper 800 400 Texas
9 Robert Pickering 110 100 Colorado
10 Tanya Hall 150 150 Texas

You could then use an aggregate function, such as SUM, to get the total of all values in the fees_paid column.

SELECT SUM(fees_paid)
FROM student;
SUM(FEES_PAID)
1710

This function collapses the many rows from the table into a single row that represents the SUM aggregate function.

There are many more aggregate functions, which we’ll look at in this article.

You can also do some interesting things with aggregate functions, such as selecting all rows that meet a maximum value or selecting the maximum value based on a subgroup.

 

What Is The GROUP BY Clause?

SQL has something called the “group by” clause. What does it mean? What does it do?

The GROUP BY clause allows you to use aggregate functions on parts of your data.

The example above showed the SUM of all of the fees_paid values in the table.

But what if you wanted to see the SUM of fees_paid for each address_state?

SELECT address_state,
SUM(fees_paid)
FROM student;

The result depends on which database you run this on:

Oracle:

ORA-00937: not a single-group group function

The ORA-00937 occurs whenever you use an aggregate function and don’t include all of the non-aggregate functions from the SELECT clause in a GROUP BY clause;

SQL Server:

Column ‘student.address_state’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.

MySQL:

MySQL will actually show you a result. However, it shows the first address_state and the sum of all fees_paid records. This seems like it’s correct but it is not.

PostgreSQL:

ERROR: column “student.address_state” must appear in the GROUP BY clause or be used in an aggregate function Position: 8

You’ll get these errors because we haven’t grouped our data.

This is where the GROUP BY clause comes in.

The GROUP BY clause allows you to group your aggregate function by other values.

You’ll need to add GROUP BY, then the names of all of the other fields in the SELECT clause.

SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state;

Result:

ADDRESS_STATE SUM(FEES_PAID)
Texas 550
New York 590
Colorado 250
Nevada 320
Florida (null)

This tells the database that you want to see the SUM of fees paid, but group by all of the address_state values.

 

What Is The HAVING Clause?

The HAVING clause is another clause that is relevant for aggregate functions.

It allows you to limit the results that are returned based on data after it has been grouped.

What if you wanted to see the SUM of the fees_paid values for each state, but only where the sum was over 500?

You could try use a WHERE clause and say WHERE fees_paid > 500.

SELECT address_state,
SUM(fees_paid)
FROM student
WHERE fees_paid > 500
GROUP BY address_state;

Result:

ADDRESS_STATE SUM(FEES_PAID)

However, in this case, you’ll get no results.

Why is this?

This is because the WHERE clause is run first, which eliminates rows that don’t meet the criteria. In this case, all rows where fees_paid <= 500 are removed from the result set.

Then, the SUM and GROUP BY are performed. However, there are no rows in the result set to do this to.

How can we get the result we want?

Use the HAVING clause.

The HAVING clause is similar to the WHERE clause, but it works on data after it has been grouped.

This is an important point to remember.

The WHERE clause works on data before it has been grouped, and the HAVING clause works on data after it has been grouped.

The syntax is similar to a WHERE clause.

SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state
HAVING fees_paid > 500;

Here’s what you’ll see.

Oracle:

ORA-00979: not a GROUP BY expression

SQL Server:

Column ‘student.fees_paid’ is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.

MySQL:

Unknown column ‘fees_paid’ in ‘having clause’

PostgreSQL:

ERROR: column “student.fees_paid” must appear in the GROUP BY clause or be used in an aggregate function Position: 81

I’m getting this error because I’ve specified the fees_paid in the HAVING clause, and I need to specify the aggregate function as well.

So, instead of saying HAVING fees_paid, I need to say HAVING SUM(fees_paid), because that’s the value we are checking.

SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state
HAVING SUM(fees_paid) > 500;

Result:

ADDRESS_STATE SUM(FEES_PAID)
Texas 550
New York 590

This now shows the result that we want.

 

DISTINCT, UNIQUE, and ALL with Aggregate Functions

SQL allows you to use several optional keywords within aggregate functions, which are:

  • DISTINCT
  • UNIQUE
  • ALL

DISTINCT and UNIQUE are synonymous with each other so I’ll refer to it as DISTINCT in this article.

They are used in this way:

SUM ( [ALL | DISTINCT ] expression)

The ALL keyword is the default and means that the aggregate function will consider all values when performing its calculation.

The DISTINCT keyword can be specified, and it means the aggregate function will only consider distinct or unique values of the expression in the calculation. This feature is offered by Oracle, SQL Server, MySQL, and PostgreSQL (and possibly others).

Let’s see an example.

SELECT SUM(fees_paid)
FROM student;

Result:

SUM(FEES_PAID)
1710

This shows the normal SUM function without DISTINCT or ALL.

SELECT SUM(ALL fees_paid)
FROM student;

Result:

SUM(ALLFEES_PAID)
1710

If we add the ALL function, it shows the same as the first example. This is because ALL is the default.

Let’s add the DISTINCT keyword:

SELECT SUM(DISTINCT fees_paid)
FROM student;
SUM(DISTINCTFEES_PAID)
1460

If we add the DISTINCT keyword, the SUM is smaller, because only distinct values are considered.

 

How Does the Database Handle NULL Values in Aggregate Functions?

In SQL, almost all of the aggregate functions ignore NULL values.

The functions that consider NULL values are:

  • COUNT(*)
  • GROUPING
  • GROUPING_ID

If you want to include the records that have NULL values, you can convert them to another value using the NVL function.

What happens if you have NULLs in your data and you use an aggregate function on them?

If you use COUNT you’ll get either a number for the count of records or 0.

If you use any other function, and you have only NULL values in the data set, the function will return NULL.

 

Can You Use Nested Aggregate Functions?

Yes, you can.

Let’s say you have the SUM of fees_paid for students in each state, as we mentioned earlier.

SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state;

Result:

ADDRESS_STATE SUM(FEES_PAID)
Texas 550
New York 590
Colorado 250
Nevada 320
Florida (null)

What if you wanted to find the average of the SUM of fees paid in each state?

You can wrap the SUM function in another aggregate function, such as AVG.

SELECT
AVG(SUM(fees_paid))
FROM student
GROUP BY address_state;

Result:

AVG(SUM(FEES_PAID)
427.5

The calculation here will find the SUM of fees_paid for each address_state, and then get the AVG of those values.

However, as you might have noticed, you need to remove the field from the SELECT clause. This is because you’re trying to get the overall average, not the average per state.

 

List of SQL Aggregate Functions

Here’s a list of SQL aggregate functions in Oracle, MySQL, SQL Server, and PostgreSQL, with a short description of what they do.

I have written detailed guides on many functions, and where a guide exists, I’ve linked the function name to it in this table. So, if you click on the function name, you’ll find more information on how to use it and see some examples.

Function Database Definition
APPROX_COUNT_DISTINCT Oracle, SQL Server Finds the approximate number of distinct rows in the expression.
ARRAY_AGG PostgreSQL Collects all input values into an array
AVG Oracle, SQL Server, MySQL, PostgreSQL Finds the average of the expression.
BIT_AND MySQL, PostgreSQL Returns the result of a bitwise AND operation
BIT_OR MySQL, PostgreSQL Returns the result of a bitwise OR operation
BIT_XOR MySQL Returns the result of a bitwise XOR operation
BOOL_AND PostgreSQL Returns true if all non-null input values are true, otherwise it returns false.
BOOL_OR PostgreSQL Returns true if any non-null input value is true, otherwise it returns false.
CHECKSUM_AGG SQL Server Finds the checksum of values in a group.
COLLECT Oracle Creates a nested table from the provided column.
CORR Oracle, PostgreSQL Finds the coefficient of correlation of a set of number pairs.
CORR_K Oracle Calculates the Kendall’s tau-b correlation coefficient.
CORR_S Oracle Calculates the Spearman’s rho correlation coefficient.
COUNT Oracle, SQL Server, MySQL, PostgreSQL Finds the number of rows returned by a query.
COUNT_BIG SQL Server Counts the number of values found. Similar to COUNT but returns a BIGINT.
COVAR_POP Oracle, PostgreSQL Finds the population covariance of a set of number pairs.
COVAR_SAMP Oracle, PostgreSQL Finds the sample covariance of a set of number pairs.
CUME_DIST Oracle, SQL Server, MySQL, PostgreSQL Finds the cumulative distribution of a value in a group of values.
DENSE_RANK Oracle, MySQL, PostgreSQL Finds the rank of a row in an ordered group of rows, and rows with the same value get the same ranks but rank numbers are not skipped. This can result in a ranking of 1, 2, 3, 3, 4 (rank 4 is still used, even if there is a tie for rank 3).
EVERY PostgreSQL Same as BOOL_AND
FIRST (Oracle)

FIRST_VALUE (SQL Server, MySQL)

Oracle, SQL Server, MySQL Used with ranking functions to get the first value.
GROUP_CONCAT MySQL Concatenates a string of values.
GROUP_ID Oracle Identifies an ID of a grouping level in GROUP BY clauses.
GROUPING Oracle, SQL Server Identifies superaggregate rows from regular grouped rows.
GROUPING_ID Oracle, SQL Server Identifies the field that has been grouped by.
JSON_ARRAYAGG (MySQL)

JSON_AGG (PostgreSQL)

MySQL, PostgreSQL Aggregates a result set as a single JSON array.
JSON_OBJECTAGG (MySQL)

JSON_OBJECT_AGG (PostgreSQL)

MySQL Aggregates a result set as a set of key-value pairs.
LAST (Oracle)

LAST_VALUE (SQL Server, MySQL)

Oracle, SQL Server, MySQL Used with ranking functions to get the last value.
LISTAGG Oracle Orders data within a set of data and then concatenates it with a specified character.
MAX Oracle, SQL Server, MySQL, PostgreSQL Finds the maximum value of the expression.
MEDIAN Oracle Finds the middle value of a set of data.
MODE PostgreSQL Finds the most frequently-used value in a set of data.
MIN Oracle, SQL Server, MySQL, PostgreSQL Finds the minimum value of the expression.
PERCENT_RANK Oracle, SQL Server, MySQL, PostgreSQL Finds the percentage rank of a row, similar to the CUME_DIST function.
PERCENTILE_CONT Oracle, SQL Server, PostgreSQL Finds a value that would fall within the specified range using a specified percentage value and sort method.
PERCENTILE_DISC Oracle, SQL Server, PostgreSQL Finds a value that would fall within the specified range using a specified percentage value and sort method.
RANK Oracle, MySQL, PostgreSQL Calculates a rank of a value in a group of values, and rows with the same value get the same rank with rank numbers getting skipped. This can result in a ranking of 1, 2, 3, 3, 5 (rank 4 is skipped because there is a tie).
REGR_ Functions Oracle, PostgreSQL Fits an ordinary-least-squares regression line to a set of number pairs.
STATS_BINOMIAL_TEST Oracle Tests the difference between a sample proportion and a given proportion.
STATS_CROSSTAB Oracle Uses a method to analyse two variables using crosstab.
STATS_F_TEST Oracle Checks whether two values are significantly different.
STATS_KS_TEST Oracle Checks whether two samples are from the same population, or different populations with the same distribution.
STATS_MODE Oracle Finds the value that occurs the most.
STATS_MW_TEST Oracle Performs a Mann Whitney test on two populations.
STATS_ONE_WAY_ANOVA Oracle Tests the difference in means for groups for statistical significance.
STATS_T_TEST_* Oracle Checks the significance of a difference of means.
STATS_WSR_TEST Oracle Performs a Wilcoxon Signed Ranks test to see if the median of the differences is significantly different from 0.
STDDEV (Oracle, MySQL, PostgreSQL)

STDEV (SQL Server)

Oracle, SQL Server, MySQL, PostgreSQL Finds the standard deviation of the set of values.
STDDEV_POP (Oracle, MySQL, PostgreSQL)

STDEVP (SQL Server)

STD (MySQL)

Oracle, SQL Server, MySQL, PostgreSQL Finds the population standard deviation of a set of values.
STDDEV_SAMP Oracle, MySQL, PostgreSQL Finds the cumulative sample standard deviation of a set of values.
STRING_AGG SQL Server, PostgreSQL Concatenates a set of values with a specified separator
SUM Oracle, SQL Server, MySQL, PostgreSQL Finds the sum of values.
SYS_OP_ZONE_ID Oracle Finds a Zone ID from a specified row ID
SYS_XMLAGG Oracle Aggregates XML documents and produces a single XML document.
VAR_POP (Oracle, MySQL, PostgreSQL)

VARP (SQL Server)

Oracle, SQL Server, MySQL, PostgreSQL Finds the population variance of a set of numbers.
VAR_SAMP Oracle, MySQL, PostgreSQL Finds the sample variance of a set of numbers.
VARIANCE (Oracle, MySQL, PostgreSQL)

VAR (SQL Server)

Oracle, SQL Server, MySQL, PostgreSQL Finds the variance of a set of numbers.
XMLAGG Oracle, PostgreSQL Aggregates XML documents and produces a single XML document.

 

So, that’s how you use aggregate functions in SQL. If you have any questions, please leave a comment at the end of this post.

4 thoughts on “SQL Aggregate Functions: A Guide”

  1. i think you need to revise your descriptions of RANK and DENSE_RANK. As I understand it, in both cases, rows with the same value get the same rank. The difference lies in the ranks given to values following a tie.
    Example:
    RANKs for (10, 20, 20, 30) are (1, 2, 2, 4).
    DENSE_RANKs for the same sequence are (1, 2, 2, 3)

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.