In this article, I explain what SQL analytic functions are, how they are different from other functions, and show you some examples.
Table of Contents
What Are SQL Analytic Functions?
Analytic functions are functions that calculate an aggregate value based on a group of rows.
A recent article I wrote on SQL aggregate functions mention that aggregate functions calculate an aggregate value for a result set, which must be grouped by the columns in the SELECT statement or to find an overall value.
With analytic functions, however, you can group the data that is used to calculate the function’s result, but show a value one each record in the result set, without grouping.
An analytic function generally looks like this:
function_name ( arguments ) OVER ( [query_partition_clause]
[ORDER BY order_by_clause [windowing_clause] ] )
Many of the aggregate functions can also be used as an analytic function.
SQL Analytic Function Example
Let’s take a look at an example.
If you had a table that looked like this:
SELECT * FROM student;
STUDENT_ID | FIRST_NAME | LAST_NAME | FEES_PAID | ADDRESS_STATE |
1 | John | Smith | 100 | New York |
2 | Susan | Johnson | 150 | Colorado |
3 | Tom | Capper | 320 | Nevada |
4 | Mark | Holloway | 410 | New York |
5 | Steven | Webber | 80 | New York |
6 | Julie | Armstrong | 0 | Texas |
7 | Michelle | Randall | Florida | |
8 | Andrew | Cooper | 400 | Texas |
9 | Robert | Pickering | 100 | Colorado |
10 | Tanya | Hall | 150 | Texas |
What if you want to find the number of students in each state, and display that next to each student record?
SELECT address_state,
COUNT(*)
FROM student
GROUP BY address_state;
ADDRESS_STATE | COUNT(*) |
Texas | 3 |
New York | 3 |
Colorado | 2 |
Nevada | 1 |
Florida | 1 |
This would work, but it would only show the states and the count. It doesn’t show the student information.
I could add the student information, but then the COUNT(*) records would only show 1 because they are grouped by the student’s first_name and last_name.
This is where analytic functions come in.
You can use the COUNT function as an analytic function
The COUNT function as an analytic function would look like this:
COUNT(*) OVER (PARTITION BY address_state)
This will find the COUNT of records for each state – but will not require any grouping in the SELECT statement.
SELECT first_name,
last_name,
address_state,
COUNT(*) OVER (PARTITION BY address_state) AS state_count
FROM student;
FIRST_NAME | LAST_NAME | ADDRESS_STATE | STATE_COUNT |
Robert | Pickering | Colorado | 2 |
Susan | Johnson | Colorado | 2 |
Michelle | Randall | Florida | 1 |
Tom | Capper | Nevada | 1 |
Mark | Holloway | New York | 3 |
John | Smith | New York | 3 |
Steven | Webber | New York | 3 |
Andrew | Cooper | Texas | 3 |
Tanya | Hall | Texas | 3 |
Julie | Armstrong | Texas | 3 |
As you can see, all of the student records are shown, along with their state, and the number of records that the state contains.
What Is The PARTITION BY Clause?
As you can see in the earlier example, the PARTITION BY specifies how the function groups the data before calculating the result.
It works similar to a GROUP BY, but it doesn’t reduce the number of rows that are shown in the result set.
If you don’t include the PARTITION BY, then you’ll get this result:
SELECT first_name, last_name, address_state,
COUNT(*) OVER () AS state_count
FROM student;
FIRST_NAME | LAST_NAME | ADDRESS_STATE | STATE_COUNT |
John | Smith | New York | 10 |
Susan | Johnson | Colorado | 10 |
Tom | Capper | Nevada | 10 |
Mark | Holloway | New York | 10 |
Steven | Webber | New York | 10 |
Julie | Armstrong | Texas | 10 |
Michelle | Randall | Florida | 10 |
Andrew | Cooper | Texas | 10 |
Robert | Pickering | Colorado | 10 |
Tanya | Hall | Texas | 10 |
The state_count column shows an overall COUNT for all records and displays it for each row.
Frequently Asked Questions
Here are some frequently asked questions, or things you should know, about SQL analytic functions.
Why Can’t I Just Use Subqueries?
You might have noticed that you can get the same results using subqueries and joins.
This is true. But, there are some advantages of using SQL analytic functions instead of subqueries and joins to get this result.
- Easier to write. Once you understand how analytic functions work, they are easier to write than subqueries and joins to get the same result.
- Easier to maintain. If you have a complex query, it can be harder to maintain it if you are using subqueries to get the same result that an analytic function would.
- May be faster. Most of the time, using the database’s built-in functionality is faster than writing your own. Using these analytic functions will likely make your code run faster than using subqueries.
When Are Analytic Functions Performed?
When the database processes a query, the analytic functions are the last set of operations performed, except for the ORDER BY clause. This means that the joins, the WHERE clause, GROUP BY clause, and HAVING clause are all performed first, then the analytic functions are performed.
This also means that the analytic functions can only appear in the SELECT list or the ORDER BY clause.
Can You Nest Analytic Functions?
Yes and no.
While aggregate functions allow you to nest functions, you can’t do the same thing with analytic functions.
However, you can specify an analytic function inside a subquery, and then perform another analytic function on that column inside your main query.
Can You Use Multiple Analytic Functions in the Same Query?
Yes, you can.
You can have different analytic functions in the same SELECT statement, and they can have the same query_partition_clause or different query_partition_clause values.
You can also have the same analytic functions but use different query_partition_clause values as well.
What Is The ORDER_BY_Clause Within Analytic Functions?
The order_by_clause within an analytic function is different from the ORDER BY clause for the entire query.
This clause specifies how data is ordered within a partition.
For some analytic functions, such as COUNT and MAX, the order does not matter.
However, in other functions such as LEAD, LAG, and RANK, the order does matter. So, this is how the ordering is done for those functions.
The order_by_clause looks like this:
ORDER BY expression [ASC | DESC] [NULLS [FIRST | LAST ] ]
In this clause:
- expression: This is the expression or column to order by.
- ASC|DESC: This specifies that the expr should be ordered in ascending or descending order. The default is ascending.
- NULLS FIRST|LAST: This specifies where rows with NULL values should appear – either first in the list, or last. For ascending order sorts, NULLS LAST is the default. For descending order sorts, NULLS FIRST is the default.
What Is The Windowing Clause?
The windowing_clause allows you to specify a range of rows that are used to perform the calculations for the current row. This is part of the window function syntax.
The syntax of the windowing_clause is quite complicated, but here it is:
ROWS|RANGE BETWEEN start_expression AND end_expression
The start_expression can be any of:
- UNBOUNDED_PRECEDING
- CURRENT ROW
- expression PRECEDING|FOLLOWING
And the end_expression can be any of:
- UNBOUNDED_FOLLOWING
- CURRENT ROW
- expression PRECEDING|FOLLOWING
The ROWS and RANGE keywords specify the window for each row for calculating the result of the function. ROWS specifies the window using rows, and RANGE specifies the window as a logical offset.
Let’s take a look at an example using the ROW type syntax.
SELECT first_name,
last_name,
address_state,
COUNT(*) OVER (PARTITION BY address_state
ORDER BY address_state ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS state_count
FROM student;
FIRST_NAME | LAST_NAME | ADDRESS_STATE | STATE_COUNT |
Robert | Pickering | Colorado | 2 |
Susan | Johnson | Colorado | 2 |
Michelle | Randall | Florida | 1 |
Tom | Capper | Nevada | 1 |
Mark | Holloway | New York | 2 |
John | Smith | New York | 3 |
Steven | Webber | New York | 3 |
Andrew | Cooper | Texas | 2 |
Tanya | Hall | Texas | 3 |
Julie | Armstrong | Texas | 3 |
You can see here that the state_count column is a little different. It only counts the records in the range that has been mentioned in the ROWS clause.
The ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING means that rows are only counted where the address_state equals the current records address_state, and only for the two rows before it and the one row after it.
This is why some of the Texas rows have a state_count of 2 and others are 3.
List Of SQL Analytic Functions
Here’s a list of the SQL analytic functions. Many of them are also aggregate functions.
Function | Database | Definition |
AVG | Oracle, SQL Server, MySQL, PostgreSQL | Finds the average of the expression. |
CORR | Oracle, PostgreSQL | Finds the coefficient of correlation of a set of number pairs. |
COUNT | Oracle, SQL Server, MySQL, PostgreSQL | Finds the number of rows returned by a query. |
COVAR_POP | Oracle, PostgreSQL | Finds the population covariance of a set of number pairs. |
COVAR_SAMP | Oracle, PostgreSQL | Finds the sample covariance of a set of number pairs. |
CUME_DIST | Oracle, SQL Server, MySQL, PostgreSQL | Finds the cumulative distribution of a value in a group of values. |
DENSE_RANK | Oracle, MySQL, PostgreSQL | Finds the rank of a row in an ordered group of rows, and rows with the same value get the same ranks but rank numbers are not skipped. This can result in a ranking of 1, 2, 3, 3, 4 (rank 4 is still used, even if there is a tie for rank 3). |
FIRST (Oracle)
FIRST_VALUE (SQL Server, MySQL) |
Oracle, SQL Server, MySQL | Used with ranking functions to get the first value. |
LAG | Oracle, SQL Server, MySQL, PostgreSQL | Get data from the preceding row without using a join |
LAST (Oracle)
LAST_VALUE (SQL Server, MySQL) |
Oracle, SQL Server, MySQL | Used with ranking functions to get the last value. |
LEAD | Oracle, SQL Server, MySQL, PostgreSQL | Get data from the following row without using a join |
LISTAGG | Oracle | Orders data within a set of data and then concatenates it with a specified character. |
MAX | Oracle, SQL Server, MySQL, PostgreSQL | Finds the maximum value of the expression. |
MEDIAN | Oracle | Finds the middle value of a set of data. |
MIN | Oracle, SQL Server, MySQL, PostgreSQL | Finds the minimum value of the expression. |
NTH_VALUE | Oracle | Returns the nth value in an ordered set of values |
NTILE | Oracle | Divides an ordered data set into buckets and assigns a bucket number to each row |
PERCENT_RANK | Oracle, SQL Server, MySQL, PostgreSQL | Finds the percentage rank of a row, similar to the CUME_DIST function. |
PERCENTILE_CONT | Oracle, SQL Server, PostgreSQL | Finds a value that would fall within the specified range using a specified percentage value and sort method. |
PERCENTILE_DISC | Oracle, SQL Server, PostgreSQL | Finds a value that would fall within the specified range using a specified percentage value and sort method. |
RANK | Oracle, MySQL, PostgreSQL | Calculates a rank of a value in a group of values, and rows with the same value get the same rank with rank numbers getting skipped. This can result in a ranking of 1, 2, 3, 3, 5 (rank 4 is skipped because there is a tie). |
RATIO_TO_REPORT | Oracle | Finds the ratio of a value to the sum of values. |
REGR_ Functions | Oracle, PostgreSQL | Fits an ordinary-least-squares regression line to a set of number pairs. |
ROW_NUMBER | Oracle | Assigns a unique number to each row. |
STDDEV (Oracle, MySQL, PostgreSQL)
STDEV (SQL Server) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the standard deviation of the set of values. |
STDDEV_POP (Oracle, MySQL, PostgreSQL)
STDEVP (SQL Server) STD (MySQL) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the population standard deviation of a set of values. |
STDDEV_SAMP | Oracle, MySQL, PostgreSQL | Finds the cumulative sample standard deviation of a set of values. |
SUM | Oracle, SQL Server, MySQL, PostgreSQL | Finds the sum of values. |
VAR_POP (Oracle, MySQL, PostgreSQL)
VARP (SQL Server) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the population variance of a set of numbers. |
VAR_SAMP | Oracle, MySQL, PostgreSQL | Finds the sample variance of a set of numbers. |
VARIANCE (Oracle, MySQL, PostgreSQL)
VAR (SQL Server) |
Oracle, SQL Server, MySQL, PostgreSQL | Finds the variance of a set of numbers. |
Conclusion
So, in conclusion, analytic functions can be used to find totals of data within different groups. They are very useful for analysing data, whether you’re getting data for an application or working on a data warehouse.
Dear Ben,
first of all, thank you very much for your articles. This site makes learning process really exciting and easy.
Regarding this article, I’ve stucked on the subchapter: “Why Can’t I Just Use Subqueries?”
You wrote: “You might have noticed that you can get the same results using subqueries and joins.”
I am trying to get the same result by using SUBQUERIES+JOIN, but I am failing to succeed.
MY CODE (using Developer11g):
1. SELECT e.employee_id,
2. e.salary,
3. d.department_id,
4. d.avg_salary
5.
6. FROM (
7.
8. SELECT department_id,
9. AVG(salary) AS avg_salary
10. FROM employees
11. GROUP BY department_id) AS d
12. JOIN employees e
13. ON e.department_id=d.department_id;
I get the error:
ORA-00933: Sql command not properly ended
00933.00000- ” Sql command not properly ended”
Error at Line: 11 Column: 35
Please help.
Thank you in advance.