SQL aggregate functions are very useful and are some of the most commonly used functions. Learn all about SQL aggregate functions and what they do in this article.
What Are SQL Aggregate Functions?
Aggregate functions are functions that allow you to view a single piece of data from multiple pieces of data.
Most functions in SQL operate on a single row or a single record, such as DECODE or LENGTH.
However, aggregate functions are different because you can use them on multiple rows and get a single value as a result.
An example of an aggregate function is SUM. This function allows you to input a range of values, such as those in a column, and get the sum total of those numbers.
If you had a table that looked like this:
SELECT * FROM student;
STUDENT_ID | FIRST_NAME | LAST_NAME | FEES_REQUIRED | FEES_PAID | ADDRESS_STATE |
1 | John | Smith | 500 | 100 | New York |
2 | Susan | Johnson | 150 | 150 | Colorado |
3 | Tom | Capper | 350 | 320 | Nevada |
4 | Mark | Holloway | 500 | 410 | New York |
5 | Steven | Webber | 100 | 80 | New York |
6 | Julie | Armstrong | 100 | 0 | Texas |
7 | Michelle | Randall | 250 | (null) | Florida |
8 | Andrew | Cooper | 800 | 400 | Texas |
9 | Robert | Pickering | 110 | 100 | Colorado |
10 | Tanya | Hall | 150 | 150 | Texas |
You could then use an aggregate function, such as SUM, to get the total of all values in the fees_paid column.
SELECT SUM(fees_paid)
FROM student;
SUM(FEES_PAID) |
1710 |
This function collapses the many rows from the table into a single row that represents the SUM aggregate function.
There are many more aggregate functions, which we’ll look at in this article.
You can also do some interesting things with aggregate functions, such as selecting all rows that meet a maximum value or selecting the maximum value based on a subgroup.
What Is The GROUP BY Clause?
SQL has something called the “group by” clause. What does it mean? What does it do?
The GROUP BY clause allows you to use aggregate functions on parts of your data.
The example above showed the SUM of all of the fees_paid values in the table.
But what if you wanted to see the SUM of fees_paid for each address_state?
SELECT address_state,
SUM(fees_paid)
FROM student;
The result depends on which database you run this on:
Oracle:
ORA-00937: not a single-group group function
The ORA-00937 occurs whenever you use an aggregate function and don’t include all of the non-aggregate functions from the SELECT clause in a GROUP BY clause;
SQL Server:
Column ‘student.address_state’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
MySQL:
MySQL will actually show you a result. However, it shows the first address_state and the sum of all fees_paid records. This seems like it’s correct but it is not.
PostgreSQL:
ERROR: column “student.address_state” must appear in the GROUP BY clause or be used in an aggregate function Position: 8
You’ll get these errors because we haven’t grouped our data.
This is where the GROUP BY clause comes in.
The GROUP BY clause allows you to group your aggregate function by other values.
You’ll need to add GROUP BY, then the names of all of the other fields in the SELECT clause.
SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state;
Result:
ADDRESS_STATE | SUM(FEES_PAID) |
Texas | 550 |
New York | 590 |
Colorado | 250 |
Nevada | 320 |
Florida | (null) |
This tells the database that you want to see the SUM of fees paid, but group by all of the address_state values.
What Is The HAVING Clause?
The HAVING clause is another clause that is relevant for aggregate functions.
It allows you to limit the results that are returned based on data after it has been grouped.
What if you wanted to see the SUM of the fees_paid values for each state, but only where the sum was over 500?
You could try use a WHERE clause and say WHERE fees_paid > 500.
SELECT address_state,
SUM(fees_paid)
FROM student
WHERE fees_paid > 500
GROUP BY address_state;
Result:
ADDRESS_STATE | SUM(FEES_PAID) |
However, in this case, you’ll get no results.
Why is this?
This is because the WHERE clause is run first, which eliminates rows that don’t meet the criteria. In this case, all rows where fees_paid <= 500 are removed from the result set.
Then, the SUM and GROUP BY are performed. However, there are no rows in the result set to do this to.
How can we get the result we want?
Use the HAVING clause.
The HAVING clause is similar to the WHERE clause, but it works on data after it has been grouped.
This is an important point to remember.
The WHERE clause works on data before it has been grouped, and the HAVING clause works on data after it has been grouped.
The syntax is similar to a WHERE clause.
SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state
HAVING fees_paid > 500;
Here’s what you’ll see.
Oracle:
ORA-00979: not a GROUP BY expression
SQL Server:
Column ‘student.fees_paid’ is invalid in the HAVING clause because it is not contained in either an aggregate function or the GROUP BY clause.
MySQL:
Unknown column ‘fees_paid’ in ‘having clause’
PostgreSQL:
ERROR: column “student.fees_paid” must appear in the GROUP BY clause or be used in an aggregate function Position: 81
I’m getting this error because I’ve specified the fees_paid in the HAVING clause, and I need to specify the aggregate function as well.
So, instead of saying HAVING fees_paid, I need to say HAVING SUM(fees_paid), because that’s the value we are checking.
SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state
HAVING SUM(fees_paid) > 500;
Result:
ADDRESS_STATE | SUM(FEES_PAID) |
Texas | 550 |
New York | 590 |
This now shows the result that we want.
DISTINCT, UNIQUE, and ALL with Aggregate Functions
SQL allows you to use several optional keywords within aggregate functions, which are:
- DISTINCT
- UNIQUE
- ALL
DISTINCT and UNIQUE are synonymous with each other so I’ll refer to it as DISTINCT in this article.
They are used in this way:
SUM ( [ALL | DISTINCT ] expression)
The ALL keyword is the default and means that the aggregate function will consider all values when performing its calculation.
The DISTINCT keyword can be specified, and it means the aggregate function will only consider distinct or unique values of the expression in the calculation. This feature is offered by Oracle, SQL Server, MySQL, and PostgreSQL (and possibly others).
Let’s see an example.
SELECT SUM(fees_paid)
FROM student;
Result:
SUM(FEES_PAID) |
1710 |
This shows the normal SUM function without DISTINCT or ALL.
SELECT SUM(ALL fees_paid)
FROM student;
Result:
SUM(ALLFEES_PAID) |
1710 |
If we add the ALL function, it shows the same as the first example. This is because ALL is the default.
Let’s add the DISTINCT keyword:
SELECT SUM(DISTINCT fees_paid)
FROM student;
SUM(DISTINCTFEES_PAID) |
1460 |
If we add the DISTINCT keyword, the SUM is smaller, because only distinct values are considered.
How Does the Database Handle NULL Values in Aggregate Functions?
In SQL, almost all of the aggregate functions ignore NULL values.
The functions that consider NULL values are:
- COUNT(*)
- GROUPING
- GROUPING_ID
If you want to include the records that have NULL values, you can convert them to another value using the NVL function.
What happens if you have NULLs in your data and you use an aggregate function on them?
If you use COUNT you’ll get either a number for the count of records or 0.
If you use any other function, and you have only NULL values in the data set, the function will return NULL.
Can You Use Nested Aggregate Functions?
Yes, you can.
Let’s say you have the SUM of fees_paid for students in each state, as we mentioned earlier.
SELECT address_state,
SUM(fees_paid)
FROM student
GROUP BY address_state;
Result:
ADDRESS_STATE | SUM(FEES_PAID) |
Texas | 550 |
New York | 590 |
Colorado | 250 |
Nevada | 320 |
Florida | (null) |
What if you wanted to find the average of the SUM of fees paid in each state?
You can wrap the SUM function in another aggregate function, such as AVG.
SELECT
AVG(SUM(fees_paid))
FROM student
GROUP BY address_state;
Result:
AVG(SUM(FEES_PAID) |
427.5 |
The calculation here will find the SUM of fees_paid for each address_state, and then get the AVG of those values.
However, as you might have noticed, you need to remove the field from the SELECT clause. This is because you’re trying to get the overall average, not the average per state.
List of SQL Aggregate Functions
Here’s a list of SQL aggregate functions in Oracle, MySQL, SQL Server, and PostgreSQL, with a short description of what they do.
I have written detailed guides on many functions, and where a guide exists, I’ve linked the function name to it in this table. So, if you click on the function name, you’ll find more information on how to use it and see some examples.
Function | Database | Definition |
APPROX_COUNT_DISTINCT | Oracle, SQL Server | Finds the approximate number of distinct rows in the expression. |
ARRAY_AGG | PostgreSQL | Collects all input values into an array |
AVG | Oracle, SQL Server, MySQL, PostgreSQL | Finds the average of the expression. |
BIT_AND | MySQL, PostgreSQL | Returns the result of a bitwise AND operation |
BIT_OR | MySQL, PostgreSQL | Returns the result of a bitwise OR operation |
BIT_XOR | MySQL | Returns the result of a bitwise XOR operation |
BOOL_AND | PostgreSQL | Returns true if all non-null input values are true, otherwise it returns false. |
BOOL_OR | PostgreSQL | Returns true if any non-null input value is true, otherwise it returns false. |
CHECKSUM_AGG | SQL Server | Finds the checksum of values in a group. |
COLLECT | Oracle | Creates a nested table from the provided column. |
CORR | Oracle, PostgreSQL | Finds the coefficient of correlation of a set of number pairs. |
CORR_K | Oracle | Calculates the Kendall’s tau-b correlation coefficient. |
CORR_S | Oracle | Calculates the Spearman’s rho correlation coefficient. |
COUNT | Oracle, SQL Server, MySQL, PostgreSQL | Finds the number of rows returned by a query. |
COUNT_BIG | SQL Server | Counts the number of values found. Similar to COUNT but returns a BIGINT. |
COVAR_POP | Oracle, PostgreSQL | Finds the population covariance of a set of number pairs. |
COVAR_SAMP | Oracle, PostgreSQL | Finds the sample covariance of a set of number pairs. |
CUME_DIST | Oracle, SQL Server, MySQL, PostgreSQL | Finds the cumulative distribution of a value in a group of values. |
DENSE_RANK | Oracle, MySQL, PostgreSQL | Finds the rank of a row in an ordered group of rows, and rows with the same value get the same ranks but rank numbers are not skipped. This can result in a ranking of 1, 2, 3, 3, 4 (rank 4 is still used, even if there is a tie for rank 3). |
EVERY | PostgreSQL | Same as BOOL_AND |
FIRST (Oracle)
FIRST_VALUE (SQL Server, MySQL) |
Oracle, SQL Server, MySQL | Used with ranking functions to get the first value. |
GROUP_CONCAT | MySQL | Concatenates a string of values. |
GROUP_ID | Oracle | Identifies an ID of a grouping level in GROUP BY clauses. |
GROUPING | Oracle, SQL Server | Identifies superaggregate rows from regular grouped rows. |
GROUPING_ID | Oracle, SQL Server | Identifies the field that has been grouped by. |
JSON_ARRAYAGG (MySQL)
JSON_AGG (PostgreSQL) |
MySQL, PostgreSQL | Aggregates a result set as a single JSON array. |
JSON_OBJECTAGG (MySQL)
JSON_OBJECT_AGG (PostgreSQL) |
MySQL | Aggregates a result set as a set of key-value pairs. |
LAST (Oracle)
LAST_VALUE (SQL Server, MySQL) |
Oracle, SQL Server, MySQL | Used with ranking functions to get the last value. |
LISTAGG | Oracle | Orders data within a set of data and then concatenates it with a specified character. |
MAX | Oracle, SQL Server, MySQL, PostgreSQL | Finds the maximum value of the expression. |
MEDIAN | Oracle | Finds the middle value of a set of data. |
MODE | PostgreSQL | Finds the most frequently-used value in a set of data. |
MIN | Oracle, SQL Server, MySQL, PostgreSQL | Finds the minimum value of the expression. |
PERCENT_RANK | Oracle, SQL Server, MySQL, PostgreSQL | Finds the percentage rank of a row, similar to the CUME_DIST function. |
PERCENTILE_CONT | Oracle, SQL Server, PostgreSQL | Finds a value that would fall within the specified range using a specified percentage value and sort method. |
PERCENTILE_DISC | Oracle, SQL Server, PostgreSQL | Finds a value that would fall within the specified range using a specified percentage value and sort method. |
RANK | Oracle, MySQL, PostgreSQL | Calculates a rank of a value in a group of values, and rows with the same value get the same rank with rank numbers getting skipped. This can result in a ranking of 1, 2, 3, 3, 5 (rank 4 is skipped because there is a tie). |
REGR_ Functions | Oracle, PostgreSQL | Fits an ordinary-least-squares regression line to a set of number pairs. |
STATS_BINOMIAL_TEST | Oracle | Tests the difference between a sample proportion and a given proportion. |
STATS_CROSSTAB | Oracle | Uses a method to analyse two variables using crosstab. |
STATS_F_TEST | Oracle | Checks whether two values are significantly different. |
STATS_KS_TEST | Oracle | Checks whether two samples are from the same population, or different populations with the same distribution. |
STATS_MODE | Oracle | Finds the value that occurs the most. |
STATS_MW_TEST | Oracle | Performs a Mann Whitney test on two populations. |
STATS_ONE_WAY_ANOVA | Oracle | Tests the difference in means for groups for statistical significance. |
STATS_T_TEST_* | Oracle | Checks the significance of a difference of means. |
STATS_WSR_TEST | Oracle | Performs a Wilcoxon Signed Ranks test to see if the median of the differences is significantly different from 0. |
STDDEV (Oracle, MySQL, PostgreSQL)
STDEV (SQL Server) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the standard deviation of the set of values. |
STDDEV_POP (Oracle, MySQL, PostgreSQL)
STDEVP (SQL Server) STD (MySQL) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the population standard deviation of a set of values. |
STDDEV_SAMP | Oracle, MySQL, PostgreSQL | Finds the cumulative sample standard deviation of a set of values. |
STRING_AGG | SQL Server, PostgreSQL | Concatenates a set of values with a specified separator |
SUM | Oracle, SQL Server, MySQL, PostgreSQL | Finds the sum of values. |
SYS_OP_ZONE_ID | Oracle | Finds a Zone ID from a specified row ID |
SYS_XMLAGG | Oracle | Aggregates XML documents and produces a single XML document. |
VAR_POP (Oracle, MySQL, PostgreSQL)
VARP (SQL Server) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the population variance of a set of numbers. |
VAR_SAMP | Oracle, MySQL, PostgreSQL | Finds the sample variance of a set of numbers. |
VARIANCE (Oracle, MySQL, PostgreSQL)
VAR (SQL Server) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the variance of a set of numbers. |
XMLAGG | Oracle, PostgreSQL | Aggregates XML documents and produces a single XML document. |
So, that’s how you use aggregate functions in SQL. If you have any questions, please leave a comment at the end of this post.
Very informative. Thank you
i think you need to revise your descriptions of RANK and DENSE_RANK. As I understand it, in both cases, rows with the same value get the same rank. The difference lies in the ranks given to values following a tie.
Example:
RANKs for (10, 20, 20, 30) are (1, 2, 2, 4).
DENSE_RANKs for the same sequence are (1, 2, 2, 3)
Hi Nelson, yes you’re right, thanks for letting me know. I’ll update the descriptions!
Thank you Mr Ben for this information
Thank you for this great article.
Thank you