Oracle SQL Analytic FunctionsIn this article, I explain what Oracle SQL analytic functions are, how they are different from other functions, and show you some examples.

What Are Oracle SQL Analytic Functions?

Analytic functions are functions that calculate an aggregate value based on a group of rows.

A recent article I wrote on SQL aggregate functions mention that aggregate functions calculate an aggregate value for a result set, which must be grouped by the columns in the SELECT statement, or to find an overall value.

With analytic functions, however, you can group the data that is used to calculate the function’s result, but show a value one each record in the result set, without grouping.

An analytic function generally looks like this:

function_name ( arguments ) OVER ( [query_partition_clause] 
[ORDER BY order_by_clause [windowing_clause] ] )

Many of the aggregate functions can also be used as an analytic function.

 

SQL Analytic Function Example

Let’s take a look at an example.

If you had a table that looked like this:

SELECT * FROM student;
STUDENT_ID FIRST_NAME LAST_NAME FEES_REQUIRED FEES_PAID ENROLMENT_DATE GENDER ADDRESS_STATE
1 John Smith 500 100 1/Feb/15 M New York
2 Susan Johnson 150 150 12/Jan/15 F Colorado
3 Tom Capper 350 320 6/Mar/15 M Nevada
4 Mark Holloway 500 410 20/Jan/15 M New York
5 Steven Webber 100 80 9/Mar/15 M New York
6 Julie Armstrong 100 0 12/Feb/15 F Texas
7 Michelle Randall 250 23/Jan/15 F Florida
8 Andrew Cooper 800 400 4/Mar/15 M Texas
9 Robert Pickering 110 100 30/Jan/15 M Colorado
10 Tanya Hall 150 150 28/Jan/15 F Texas

What if you want to find the number of students in each state, and display that next to each student record?

SELECT address_state,
COUNT(*)
FROM student
GROUP BY address_state;
ADDRESS_STATE COUNT(*)
Texas 3
New York 3
Colorado 2
Nevada 1
Florida 1

This would work, but it would only show the states and the count. It doesn’t show the student information.

I could add the student information, but then the COUNT(*) records would only show 1, because they are grouped by the student’s first_name and last_name.

This is where analytic functions come in.

You can use the COUNT function as an analytic function

The COUNT function as an analytic function would look like this:

COUNT(*) OVER (PARTITION BY address_state)

This will find the COUNT of records for each state – but will not require any grouping in the SELECT statement.

SELECT first_name,
last_name,
address_state,
COUNT(*) OVER (PARTITION BY address_state) AS state_count
FROM student;
FIRST_NAME LAST_NAME ADDRESS_STATE STATE_COUNT
Robert Pickering Colorado 2
Susan Johnson Colorado 2
Michelle Randall Florida 1
Tom Capper Nevada 1
Mark Holloway New York 3
John Smith New York 3
Steven Webber New York 3
Andrew Cooper Texas 3
Tanya Hall Texas 3
Julie Armstrong Texas 3

As you can see, all of the student records are shown, along with their state, and the number of records that state contains.

 

What Is The PARTITION BY Clause?

As you can see in the earlier example, the PARTITION BY specifies how the function groups the data before calculating the result.

It works similar to a GROUP BY, but it doesn’t reduce the number of rows that are shown in the result set.

If you don’t include the PARTITION BY, then you’ll get this result:

SELECT first_name, last_name, address_state,
COUNT(*) OVER () AS state_count
FROM student;
FIRST_NAME LAST_NAME ADDRESS_STATE STATE_COUNT
John Smith New York 10
Susan Johnson Colorado 10
Tom Capper Nevada 10
Mark Holloway New York 10
Steven Webber New York 10
Julie Armstrong Texas 10
Michelle Randall Florida 10
Andrew Cooper Texas 10
Robert Pickering Colorado 10
Tanya Hall Texas 10

The state_count column shows an overall COUNT for all records, and displays it for each row.

 

Why Can’t I Just Use Subqueries?

You might have noticed that you can get the same results using subqueries and joins.

This is true. But, there are some advantages of using Oracle SQL analytic functions instead of subqueries and joins to get this result.

  • Easier to write. Once you understand how analytic functions work, they are easier to write than subqueries and joins to get the same result.
  • Easier to maintain. If you have a complex query, it can be harder to maintain it if you are using subqueries to get the same result that an analytic function would.
  • May be faster. Most of the time, using Oracle’s built-in functionality is faster than writing your own. Using these analytic functions will likely make your code run faster than using subqueries.

 

When Are Analytic Functions Performed?

When Oracle processes a query, the analytic functions are the last set of operations performed, except for the ORDER BY clause. This means that the joins, the WHERE clause, GROUP BY clause, and HAVING clause are all performed first, then the analytic functions are performed.

This means that the analytic functions can only appear in the SELECT list or the ORDER BY clause.

 

Can You Nest Analytic Functions?

Yes and no.

While aggregate functions allow you to nest functions, you can’t do the same thing with analytic functions.

However, you can specify an analytic function inside a subquery, and then perform another analytic function on that column inside your main query.

 

Can You Use Multiple Analytic Functions in the Same Query?

Yes, you can.

You can have different analytic functions in the same SELECT statement, and they can have the same query_partition_clause or different query_partition_clause values.

You can also have the same analytic functions but use different query_partition_clause values as well.

 

What Is The ORDER_BY_Clause Within Analytic Functions?

The order_by_clause within an analytic function is different to the ORDER BY clause for the entire query.

This clause specifies how data is ordered within a partition.

For some analytic functions, such as COUNT and MAX, the order does not matter.

However, other functions such as LEAD, LAG, and RANK, the order does matter. So, this is how the ordering is done for those functions.

The order_by_clause looks like this:

ORDER BY expr [ASC | DESC] [NULLS [FIRST | LAST ] ]

In this clause:

  • expr: This is the expression or column to order by.
  • ASC|DESC: This specifies that the expr should be ordered in ascending or descending order. The default is ascending.
  • NULLS FIRST|LAST: This specifies where rows with NULL values should appear – either first in the list, or last. For ascending order sorts, NULLS LAST is the default. For descending order sorts, NULLS FIRST is the default.

 

What Is The Windowing Clause?

The windowing_clause allows you to specify a range of rows that are used to perform the calculations for the current row.

The syntax of the windowing_clause is quite complicated, but here it is:

 ROWS|RANGE BETWEEN start_expr AND end_expr

The start_expr can be any of:

  • UNBOUNDED_PRECEDING
  • CURRENT ROW
  • expr PRECEDING|FOLLOWING

And the end_expr can be any of:

  • UNBOUNDED_FOLLOWING
  • CURRENT ROW
  • expr PRECEDING|FOLLOWING

The ROWS and RANGE keywords specify the window for each row for calculating the result of the function. ROWS specifies the window using rows, and RANGE specifies the window as a logical offset.

Let’s take a look at an example using the ROW type syntax.

SELECT first_name,
last_name,
address_state,
COUNT(*) OVER (PARTITION BY address_state 
ORDER BY address_state ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING) AS state_count
FROM student;
FIRST_NAME LAST_NAME ADDRESS_STATE STATE_COUNT
Robert Pickering Colorado 2
Susan Johnson Colorado 2
Michelle Randall Florida 1
Tom Capper Nevada 1
Mark Holloway New York 2
John Smith New York 3
Steven Webber New York 3
Andrew Cooper Texas 2
Tanya Hall Texas 3
Julie Armstrong Texas 3

You can see here that the state_count is a little different. It only counts the records in the range that has been mentioned in the ROWS clause.

The ROWS BETWEEN 2 PRECEDING AND 1 FOLLOWING means that rows are only counted where the address_state equals the current records address_state, and only for the two rows before it and the one row after it.

This is why some of the Texas rows have a state_count of 2 and others are 3.

What Is The List Of Oracle SQL Analytic Functions?

Here’s a list of the Oracle SQL analytic functions. Many of them are also aggregate functions.

Function Definition
AVG Finds the average of the expression.
CORR Finds the coefficient of correlation of a set of number pairs.
COUNT Finds the number of rows returned by a query.
COVAR_POP Finds the population covariance of a set of number pairs.
COVAR_SAMP Finds the sample covariance of a set of number pairs.
CUME_DIST Finds the cumulative distribution of a value in a group of values.
DENSE_RANK Finds the rank of a row in an ordered group of rows, and rows with the same value get different ranks.
FIRST Used with ranking functions to get the first value.
FIRST_VALUE Finds the first value in an ordered set of values.
LAG Get data from the preceding row without using a join
LAST Used with ranking functions to get the last value.
LAST_VALUE Finds the last value in an ordered set of values.
LEAD Get data from the following row without using a join
LISTAGG Orders data within a set of data and then concatenates it with a specified character.
MAX Finds the maximum value of the expression.
MEDIAN Finds the middle value of a set of data.
MIN Finds the minimum value of the expression.
NTH_VALUE Returns the nth value in an ordered set of values
NTILE Divides an ordered data set into buckets and assigns a bucket number to each row
PERCENT_RANK Finds the percentage rank of a row, similar to the CUME_DIST function.
PERCENTILE_CONT Finds a value that would fall within the specified range using a specified percentage value and sort method.
PERCENTILE_DISC Finds a value that would fall within the specified range using a specified percentage value and sort method.
RANK Calculates a rank of a value in a group of values, and rows with the same value get the same rank
RATIO_TO_REPORT Finds the ratio of a value to the sum of values.
REGR_ Functions Fits an ordinary-least-squares regression line ot a set of number pairs.
ROW_NUMBER Assigns a unique number to each row.
STDDEV Finds the standard deviation of the set of values.
STDDEV_POP Finds the population standard deviation of a set of values.
STDDEV_SAMP Finds the cumulative sample standard deviation of a set of values.
SUM Finds the sum of values.
VAR_POP Finds the population variance of a set of numbers.
VAR_SAMP Finds the sample variance of a set of numbers.
VARIANCE Finds the variance of a set of numbers.

 

Conclusion

So, in conclusion, analytic functions can be used to find totals of data within different groups. They are very useful for analysing data, whether you’re getting data for an application or working on a data warehouse.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!

Improve Your Oracle SQL With My 10-Day Email Course

x