FB

In this article, I explain what SQL analytic functions are, how they are different from other functions, and show you some examples.

 

What Are SQL Analytic Functions?

Analytic functions are functions that calculate an aggregate value based on a group of rows.

A recent article I wrote on SQL aggregate functions mention that aggregate functions calculate an aggregate value for a result set, which must be grouped by the columns in the SELECT statement or to find an overall value.

With analytic functions, however, you can group the data that is used to calculate the function’s result, but show a value one each record in the result set, without grouping.

An analytic function generally looks like this:

function_name ( arguments ) OVER ( [query_partition_clause] 
[ORDER BY order_by_clause [windowing_clause] ] )

Many of the aggregate functions can also be used as an analytic function.

 

SQL Analytic Function Example

Let’s take a look at an example.

If you had a table that looked like this:

SELECT * FROM student;
STUDENT_ID FIRST_NAME LAST_NAME FEES_PAID ADDRESS_STATE
1 John Smith 100 New York
2 Susan Johnson 150 Colorado
3 Tom Capper 320 Nevada
4 Mark Holloway 410 New York
5 Steven Webber 80 New York
6 Julie Armstrong 0 Texas
7 Michelle Randall Florida
8 Andrew Cooper 400 Texas
9 Robert Pickering 100 Colorado
10 Tanya Hall 150 Texas

What if you want to find the number of students in each state, and display that next to each student record?

SELECT address_state,
COUNT(*)
FROM student
GROUP BY address_state;
ADDRESS_STATE COUNT(*)
Texas 3
New York 3
Colorado 2
Nevada 1
Florida 1

This would work, but it would only show the states and the count. It doesn’t show the student information.

I could add the student information, but then the COUNT(*) records would only show 1 because they are grouped by the student’s first_name and last_name.

This is where analytic functions come in.

You can use the COUNT function as an analytic function

The COUNT function as an analytic function would look like this:

COUNT(*) OVER (PARTITION BY address_state)

This will find the COUNT of records for each state – but will not require any grouping in the SELECT statement.

SELECT first_name,
last_name,
address_state,
COUNT(*) OVER (PARTITION BY address_state) AS state_count
FROM student;
FIRST_NAME LAST_NAME ADDRESS_STATE STATE_COUNT
Robert Pickering Colorado 2
Susan Johnson Colorado 2
Michelle Randall Florida 1
Tom Capper Nevada 1
Mark Holloway New York 3
John Smith New York 3
Steven Webber New York 3
Andrew Cooper Texas 3
Tanya Hall Texas 3
Julie Armstrong Texas 3

As you can see, all of the student records are shown, along with their state, and the number of records that the state contains.

 

What Is The PARTITION BY Clause?

As you can see in the earlier example, the PARTITION BY specifies how the function groups the data before calculating the result.

It works similar to a GROUP BY, but it doesn’t reduce the number of rows that are shown in the result set.

If you don’t include the PARTITION BY, then you’ll get this result:

SELECT first_name, last_name, address_state,
COUNT(*) OVER () AS state_count
FROM student;
FIRST_NAME LAST_NAME ADDRESS_STATE STATE_COUNT
John Smith New York 10
Susan Johnson Colorado 10
Tom Capper Nevada 10
Mark Holloway New York 10
Steven Webber New York 10
Julie Armstrong Texas 10
Michelle Randall Florida 10
Andrew Cooper Texas 10
Robert Pickering Colorado 10
Tanya Hall Texas 10

The state_count column shows an overall COUNT for all records and displays it for each row.

 

Frequently Asked Questions

Here are some frequently asked questions, or things you should know, about SQL analytic functions.

Why Can’t I Just Use Subqueries?

You might have noticed that you can get the same results using subqueries and joins.

This is true. But, there are some advantages of using SQL analytic functions instead of subqueries and joins to get this result.

  • Easier to write. Once you understand how analytic functions work, they are easier to write than subqueries and joins to get the same result.
  • Easier to maintain. If you have a complex query, it can be harder to maintain it if you are using subqueries to get the same result that an analytic function would.
  • May be faster. Most of the time, using the database’s built-in functionality is faster than writing your own. Using these analytic functions will likely make your code run faster than using subqueries.

 

When Are Analytic Functions Performed?

When the database processes a query, the analytic functions are the last set of operations performed, except for the ORDER BY clause. This means that the joins, the WHERE clause, GROUP BY clause, and HAVING clause are all performed first, then the analytic functions are performed.

This also means that the analytic functions can only appear in the SELECT list or the ORDER BY clause.

 

Can You Nest Analytic Functions?

Yes and no.

While aggregate functions allow you to nest functions, you can’t do the same thing with analytic functions.

However, you can specify an analytic function inside a subquery, and then perform another analytic function on that column inside your main query.

 

Can You Use Multiple Analytic Functions in the Same Query?

Yes, you can.

You can have different analytic functions in the same SELECT statement, and they can have the same query_partition_clause or different query_partition_clause values.

You can also have the same analytic functions but use different query_partition_clause values as well.

What Is The ORDER_BY_Clause Within Analytic Functions?

The order_by_clause within an analytic function is different from the ORDER BY clause for the entire query.

This clause specifies how data is ordered within a partition.

For some analytic functions, such as COUNT and MAX, the order does not matter.

However, in other functions such as LEAD, LAG, and RANK, the order does matter. So, this is how the ordering is done for those functions.

The order_by_clause looks like this:

ORDER BY expression [ASC | DESC] [NULLS [FIRST | LAST ] ]

In this clause:

  • expression: This is the expression or column to order by.
  • ASC|DESC: This specifies that the expr should be ordered in ascending or descending order. The default is ascending.
  • NULLS FIRST|LAST: This specifies where rows with NULL values should appear – either first in the list, or last. For ascending order sorts, NULLS LAST is the default. For descending order sorts, NULLS FIRST is the default.

 

What Is The Windowing Clause?

The windowing_clause allows you to specify a range of rows that are used to perform the calculations for the current row. This is part of the window function syntax.

The syntax of the windowing_clause is quite complicated, but here it is:

 ROWS|RANGE BETWEEN start_expression AND end_expression

The start_expression can be any of:

  • UNBOUNDED_PRECEDING
  • CURRENT ROW
  • expression PRECEDING|FOLLOWING

And the end_expression can be any of:

  • UNBOUNDED_FOLLOWING
  • CURRENT ROW
  • expression PRECEDING|FOLLOWING

The ROWS and RANGE keywords specify the window for each row for calculating the result of the function. ROWS specifies the window using rows, and RANGE specifies the window as a logical offset.

Let’s take a look at an example using the ROW type syntax.

SELECT first_name,
last_name,
address_state,
COUNT(*) OVER (PARTITION BY address_state 
ORDER BY address_state ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS state_count
FROM student;
FIRST_NAME LAST_NAME ADDRESS_STATE STATE_COUNT
Robert Pickering Colorado 2
Susan Johnson Colorado 2
Michelle Randall Florida 1
Tom Capper Nevada 1
Mark Holloway New York 2
John Smith New York 3
Steven Webber New York 3
Andrew Cooper Texas 2
Tanya Hall Texas 3
Julie Armstrong Texas 3

You can see here that the state_count column is a little different. It only counts the records in the range that has been mentioned in the ROWS clause.

The ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING means that rows are only counted where the address_state equals the current records address_state, and only for the two rows before it and the one row after it.

This is why some of the Texas rows have a state_count of 2 and others are 3.

 

List Of SQL Analytic Functions

Here’s a list of the SQL analytic functions. Many of them are also aggregate functions.

Function Database Definition
AVG Oracle, SQL Server, MySQL, PostgreSQL Finds the average of the expression.
CORR Oracle, PostgreSQL Finds the coefficient of correlation of a set of number pairs.
COUNT Oracle, SQL Server, MySQL, PostgreSQL Finds the number of rows returned by a query.
COVAR_POP Oracle, PostgreSQL Finds the population covariance of a set of number pairs.
COVAR_SAMP Oracle, PostgreSQL Finds the sample covariance of a set of number pairs.
CUME_DIST Oracle, SQL Server, MySQL, PostgreSQL Finds the cumulative distribution of a value in a group of values.
DENSE_RANK Oracle, MySQL, PostgreSQL Finds the rank of a row in an ordered group of rows, and rows with the same value get the same ranks but rank numbers are not skipped. This can result in a ranking of 1, 2, 3, 3, 4 (rank 4 is still used, even if there is a tie for rank 3).
FIRST (Oracle)

FIRST_VALUE (SQL Server, MySQL)

Oracle, SQL Server, MySQL Used with ranking functions to get the first value.
LAG Oracle, SQL Server, MySQL, PostgreSQL Get data from the preceding row without using a join
LAST (Oracle)

LAST_VALUE (SQL Server, MySQL)

Oracle, SQL Server, MySQL Used with ranking functions to get the last value.
LEAD Oracle, SQL Server, MySQL, PostgreSQL Get data from the following row without using a join
LISTAGG Oracle Orders data within a set of data and then concatenates it with a specified character.
MAX Oracle, SQL Server, MySQL, PostgreSQL Finds the maximum value of the expression.
MEDIAN Oracle Finds the middle value of a set of data.
MIN Oracle, SQL Server, MySQL, PostgreSQL Finds the minimum value of the expression.
NTH_VALUE Oracle Returns the nth value in an ordered set of values
NTILE Oracle Divides an ordered data set into buckets and assigns a bucket number to each row
PERCENT_RANK Oracle, SQL Server, MySQL, PostgreSQL Finds the percentage rank of a row, similar to the CUME_DIST function.
PERCENTILE_CONT Oracle, SQL Server, PostgreSQL Finds a value that would fall within the specified range using a specified percentage value and sort method.
PERCENTILE_DISC Oracle, SQL Server, PostgreSQL Finds a value that would fall within the specified range using a specified percentage value and sort method.
RANK Oracle, MySQL, PostgreSQL Calculates a rank of a value in a group of values, and rows with the same value get the same rank with rank numbers getting skipped. This can result in a ranking of 1, 2, 3, 3, 5 (rank 4 is skipped because there is a tie).
RATIO_TO_REPORT Oracle Finds the ratio of a value to the sum of values.
REGR_ Functions Oracle, PostgreSQL Fits an ordinary-least-squares regression line to a set of number pairs.
ROW_NUMBER Oracle Assigns a unique number to each row.
STDDEV (Oracle, MySQL, PostgreSQL)

STDEV (SQL Server)

Oracle, SQL Server, MySQL, PostgreSQL Finds the standard deviation of the set of values.
STDDEV_POP (Oracle, MySQL, PostgreSQL)

STDEVP (SQL Server)

STD (MySQL)

Oracle, SQL Server, MySQL, PostgreSQL Finds the population standard deviation of a set of values.
STDDEV_SAMP Oracle, MySQL, PostgreSQL Finds the cumulative sample standard deviation of a set of values.
SUM Oracle, SQL Server, MySQL, PostgreSQL Finds the sum of values.
VAR_POP (Oracle, MySQL, PostgreSQL)

VARP (SQL Server)

Oracle, SQL Server, MySQL, PostgreSQL Finds the population variance of a set of numbers.
VAR_SAMP Oracle, MySQL, PostgreSQL Finds the sample variance of a set of numbers.
VARIANCE (Oracle, MySQL, PostgreSQL)

VAR (SQL Server)

Oracle, SQL Server, MySQL, PostgreSQL Finds the variance of a set of numbers.

 

Conclusion

So, in conclusion, analytic functions can be used to find totals of data within different groups. They are very useful for analysing data, whether you’re getting data for an application or working on a data warehouse.

Share via
Copy link
Powered by Social Snap