FB

Oracle Variance Functions Guide, FAQ, and Examples

In this article, we’ll look at a range of Variance functions (VARIANCE, CUME_DIST, CORR, VAR_POP, COVAR_POP, VAR_SAMP, and COVAR_SAMP), explain what they do, and see some examples.

These functions can be used as an window function or an aggregate function. I’ll explain both formats in this article.

 

What is a Variance?

Before we look at the functions, let’s understand what a variance is.

A variance is a mathematical concept, which represents how different the values are in a set of values (such as  a column of data).

That link can probably explain it better than I can!

 

Purpose of the Oracle Variance Functions

There are quite a few variance functions in Oracle. Let’s take a look at what they all do.

VARIANCE

The purpose of the VARIANCE function is to, well, return the variance of a set of numbers.

CUME_DIST

The purpose of the CUME_DIST function is to find the cumulative distribution of a value in a list of values.

The value that is returned will be between 0 and 1.

CORR

The Oracle CORR function returns the “coefficient of correlation” of a set of number pairs.

What is a coefficient of correlation? Here’s an definition of what it means.

COVAR_POP

The Oracle COVAR_POP function calculates the population covariance of a set of number pairs.

What is a “population covariance”? Good question. Here’s an article that explains what the definition is.

VAR_SAMP

The VAR_SAMP function will return the sample variance of a set of numbers, after excluding NULLs.

What does this mean?

The sample variation calculates the variation of a sample of numbers from the overall set. The higher the result, the larger the variation between the numbers.

COVAR_SAMP

This function returns the covariance of a set of number pairs.

What does that mean? Here’s an explanation of what a covariance is.

VAR_POP

The VAR_POP function finds the population variance of a set of numbers, excluding NULLs.

What does this mean?

The population variance is a calculation that determines the difference or variation in a complete set of numbers. The higher the number is, the bigger the variance.

 

Oracle VARIANCE Function Syntax and Parameters

Aggregate Function

The syntax of the VARIANCE function when used as an aggregate function is:

VARIANCE ( [DISTINCT | ALL] expression)

The parameters of the VARIANCE aggregate function are:

  • DISTINCT or ALL (optional): If you specify ALL, then all values are considered for the variance. If you specify DISTINCT, only the distinct values are considered in the variance (duplicates are ignored).
  • expression (mandatory): This is the expression (such as a column) to be analysed for variance.

The data type can be any numeric data type, and the return value is the same data type that was provided to it.

 

Analytic Function

If you’re using the VARIANCE function as an analytic function, then the syntax is:

VARIANCE ( [DISTINCT | ALL] expression) OVER (analytic_clause)

The parameters of VARIANCE when using it as an analytic function are:

  • DISTINCT or ALL (optional): If you specify ALL, then all values are considered for the variance. If you specify DISTINCT, only the distinct values are considered in the variance (duplicates are ignored).
  • expression (mandatory): This is the expression (such as a column) to be analysed for variance.
  • analytic_clause (mandatory): This is where you specify a query partition clause, order by clause,, and windowing clause.

If you specify DISTINCT, then you are only able to specify the query partition clause inside the analytic clause. The order by clause and windowing clause are not allowed.

 

Oracle CUME_DIST Syntax and Parameters

Aggregate Function

The syntax for this function is different depending on how you use it.

As an aggregate function, it looks like this:

CUME_DIST (expression1, … expression_n) WITHIN GROUP (ORDER BY expression_order1, … expression_order_n)

The parameters of the CUME_DIST function as an aggregate function are:

  • expression1 (mandatory): The list of expressions to evaluate a distribution for.
  • expression_order1 (mandatory): The list of expressions that contain the larger list of values. Expression1 is checked within expression_order1.

You can have more than one expression in each set of parameters, separated by a comma. However, the data types need to match between them.

So, the data type of expression1 needs to match expression_order1, and the data type of expression2 needs to match expression_order2, and so on.

 

Analytic Function

As an analytic function, it looks like this:

CUME_DIST() OVER ( [query_partition_clause] ORDER BY order_clause )

When it is run as an analytic function, it performs the cumulative distribution for the row’s value against a range of values. The result is likely different for each row.

The parameters for the analytic version of CUME_DIST are:

  • query_partition_clause (optional): You can specify how you want your CUME_DIST value grouped. If this is not specified, the function looks at all values.
  • order_by_clause (mandatory): This is the column used to calculate the cumulative distribution.

 

Oracle CORR Function Syntax and Parameters

Aggregate Function

The syntax for CORR as an aggregate function is:

CORR (expression1, expression2)

The parameters of the CORR aggregate function are:

  • expression1 (mandatory): The first set of data to use in the coefficient calculation.
  • expression2 (mandatory): The second set of data to use in the coefficient calculation.

Both of these fields are mandatory because the calculation needs two sets of data.

Analytic Function

The syntax of the CORR function as an analytic function is:

CORR (expression1, expression2) OVER (analytic_clause)

Just like the aggregate function, both expressions are required. The analytic_clause is new.

The analytic_clause allows you to specify the query partition clause, the order by clause, and the windowing clause.

 

Oracle VAR_POP Function Syntax and Parameters

The syntax of the VAR_POP function is:

VAR_POP ( expression ) [OVER (analytic_clause)]

Parameters

  • expression (mandatory): This is the expression, such as a set of data or a column, to perform the population variance calculation on.
  • analytic_clause (optional): This is the clause that determines how the analytic function behaves, if using this as an analytic function.

The expression can be any numeric data type, and returns the same data type as the expression parameter.

 

Oracle COVAR_POP Function Syntax and Parameters

Aggregate Function

The syntax of the COVAR_POP aggregate function is:

COVAR_POP ( expression1, expression2 )

The parameters of the COVAR_POP function are:

  • expression1 (mandatory): The first set of numbers to use in the calculation.
  • expression2 (mandatory): The second set of numbers to use in the calculation.

If there is a pair of numbers that are both set to NULL, then the function does not consider them.

 

Analytic Function

When you use COVAR_POP as an analytic function, the syntax is:

COVAR_POP (expression1, expression2) OVER (analytic_clause)

The analytic_clause is added here, which is common for analytic queries.

The parameters of the COVAR_POP analytic function are:

  • expression1 (mandatory): The first set of numbers to use in the calculation.
  • expression2 (mandatory): The second set of numbers to use in the calculation.
  • analytic_clause (mandatory): This defines how the analytic function operates. It includes a query partition clause, order by clause, and a windowing clause.

 

Oracle VAR_SAMP Function Syntax and Parameters

The syntax of the VAR_SAMP function is:

VAR_SAMP ( expression ) [OVER (analytic_clause)]

The parameters of the VAR_SAMP function are:

  • expression (mandatory): This is the expression, such as a set of data or a column, to get the sample variance for.
  • analytic_clause (optional): This clause determines how the analytic function behaves.

The expression can be any numeric data type, and returns the same data type as the expression parameter.

 

Oracle COVAR_SAMP Function Syntax and Parameters

Aggregate Function

When using COVAR_SAMP as an aggregate function, the syntax is:

COVAR_SAMP (expression1, expression2)

The parameters of the COVAR_SAMP function are:

  • expression1 (mandatory): The first set of numbers to use in the calculation.
  • expression2 (mandatory): The second set of numbers to use in the calculation.

These two parameters are a numeric data type, and return the same data type as the parameters.

 

Analytic Function

If you want to use the COVAR_SAMP function as an analytic function, the syntax is:

COVAR_SAMP (expression1, expression2) OVER (analytic_clause)

The only difference here is the addition of the analytic_clause, which is quite common for analytic queries.

The parameters of the COVAR_SAMP analytic function are:

  • expression1 (mandatory): The first set of numbers to use in the calculation.
  • expression2 (mandatory): The second set of numbers to use in the calculation.
  • analytic_clause (mandatory): This defines how the analytic function operates. It includes a query partition clause, order by clause, and a windowing clause.

 

Examples of the Oracle VARIANCE Function

Example 1 – simple aggregate

This is an example of the VARIANCE function with a simple set of parameters.

SELECT VARIANCE(fees_paid)
FROM student;

Results:

VARIANCE(FEES_PAID)
38910.25641

 

Example 2 – aggregate using DISTINCT

This example uses the DISTINCT keyword to eliminate duplicates in the variance calculation.

SELECT VARIANCE(DISTINCT fees_paid) AS variance_test
FROM student;

Results:

VARIANCE_TEST
47102.5

 

Example 3 – analytic function

SELECT first_name, last_name, fees_paid,
VARIANCE(fees_paid) OVER (ORDER BY enrolment_date) AS variance_value
FROM student;

Result:

FIRST_NAME LAST_NAME FEES_PAID VARIANCE_VALUE
Susan Johnson 150 0
Mark Holloway 410 33800
Michelle Randall (null) 33800
Tanya Hall 150 22533.33333
Robert Pickering 100 19691.66667
John Smith 100 16870
Julie Armstrong 0 19016.66667
Andrew Cooper 400 24657.14286
Tom Capper 320 23341.07143
Steven Webber 80 22125
John Rogers 700 38910.25641
Jarrad Winston 300 38910.25641
Mary Taylor 100 38910.25641
Mark Anderson 45 38910.25641

This shows the variance for the fees_paid column.

 

Examples of the Oracle CUME_DIST Function

Let’s take a look at some examples of the CUME_DIST function.

Example 1 – single parameter as aggregate

This example uses a single parameter for CUME_DIST.

SELECT
CUME_DIST(300) WITHIN GROUP (ORDER BY fees_paid) AS cume_dist_test
FROM student;

Result:

CUME_DIST_TEST
0.5882352941

This shows the cumulative distribution of a student who has a fees_paid value of 300 within the entire table.

 

Example 2 – multiple parameters as aggregate

This example uses multiple parameters for CUME_DIST.

SELECT
CUME_DIST(300, 200) WITHIN GROUP (ORDER BY fees_paid, fees_required) AS cume_dist_test
FROM student;

Result:

CUME_DIST_TEST
0.5294117647

This shows the cumulative distribution of a student who has a fees_paid value of 300 and a fees_required value of 200 within the entire table.

 

Example 3 – analytic, no partition

This example has no query partition clause.

SELECT first_name, last_name, fees_paid,
CUME_DIST() OVER (ORDER BY fees_paid) AS cume_dist_test
FROM student;

Result:

FIRST_NAME LAST_NAME FEES_PAID CUME_DIST_TEST
Julie Armstrong 0 0.07142857143
Mark Anderson 45 0.1428571429
Steven Webber 80 0.2142857143
John Smith 100 0.4285714286
Robert Pickering 100 0.4285714286
Mary Taylor 100 0.4285714286
Susan Johnson 150 0.5714285714
Tanya Hall 150 0.5714285714
Jarrad Winston 300 0.6428571429
Tom Capper 320 0.7142857143
Andrew Cooper 400 0.7857142857
Mark Holloway 410 0.8571428571
John Rogers 700 0.9285714286
Michelle Randall (null) 1

The CUME_DIST column shows the cumulative distribution of each row’s fees_paid value, with no partitioning.

 

Example 4 – analytic with partition

This example is similar, but includes a query partition clause.

SELECT first_name, last_name, fees_paid,
CUME_DIST() OVER (PARTITION BY address_state ORDER BY fees_paid) AS cume_dist_test
FROM student;

Result:

FIRST_NAME LAST_NAME FEES_PAID CUME_DIST_TEST
Mark Anderson 45 1
Robert Pickering 100 0.5
Susan Johnson 150 1
Michelle Randall 1
Tom Capper 320 0.5
John Rogers 700 1
Steven Webber 80 0.3333333333
John Smith 100 0.6666666667
Mark Holloway 410 1
Mary Taylor 100 1
Julie Armstrong 0 0.3333333333
Tanya Hall 150 0.6666666667
Andrew Cooper 400 1
Jarrad Winston 300 1

This example also shows the cumulative distribution of the fees_paid column for each row, but it is grouped into the address_state values.

 

Examples of the Oracle CORR Function

Here are some examples of the CORR function.

Example 1 – aggregate function

SELECT CORR(fees_paid, fees_required) AS corr_test
FROM student;

Result:

CORR_TEST
0.140645438

 

Example 2 – analytic function

This example uses CORR as an analytic function.

SELECT first_name, last_name,
CORR(fees_paid, fees_required) OVER (ORDER BY first_name) AS corr_value
FROM student;

Result:

FIRST_NAME LAST_NAME CORR_VALUE
Andrew Cooper (null)
Jarrad Winston 1
John Smith -0.5214414501
John Rogers -0.5214414501
Julie Armstrong 0.102484048
Mark Anderson -0.1255519091
Mark Holloway -0.1255519091
Mary Taylor -0.112526085
Michelle Randall -0.112526085
Robert Pickering 0.01309498303
Steven Webber 0.1030915728
Susan Johnson 0.1288908087
Tanya Hall 0.1486480032
Tom Capper 0.140645438

 

Examples of the Oracle VAR_POP Function

Here are some examples of the VAR_POP function, both as an analytic function and an aggregate function.

Example 1 – Aggregate Function

This example shows how to use VAR_POP as an aggregate function

SELECT
VAR_POP(fees_required) AS var_pop_test
FROM student;

Result:

VAR_POP_TEST
66748.97959

 

Example 2 – Analytic Function

This example shows how to use VAR_POP as an analytic function.

SELECT first_name, last_name, address_state,
VAR_POP(fees_required) OVER (PARTITION BY address_state) AS var_pop_test
FROM student;

Result:

FIRST_NAME LAST_NAME ADDRESS_STATE VAR_POP_TEST
Mark Anderson California 0
Susan Johnson Colorado 400
Robert Pickering Colorado 400
Michelle Randall Florida 0
Tom Capper Nevada 4900
John Rogers Nevada 4900
John Smith New York 35555.55556
Mark Holloway New York 35555.55556
Steven Webber New York 35555.55556
Mary Taylor Oregon 0
Andrew Cooper Texas 101666.6667
Tanya Hall Texas 101666.6667
Julie Armstrong Texas 101666.6667
Jarrad Winston Utah 0

 

Examples of the Oracle COVAR_POP Function

Example 1 – aggregate

Here is an example of the COVAR_POP function when used as an aggregate.

SELECT COVAR_POP(fees_paid, fees_required) AS covarpop_value
FROM student;

Result:

COVARPOP_VALUE
7079.585799

 

Example 2 – analytic function

This is an example of the COVAR_POP function as an analytic function.

SELECT first_name,
last_name,
COVAR_POP(fees_paid, fees_required) OVER (ORDER BY last_name) AS covar_pop_value
FROM student;

Result:

FIRST_NAME LAST_NAME COVAR_POP_VALUE
Mark Anderson 0
Julie Armstrong 8550
Tom Capper -2894.444444
Andrew Cooper 16790.625
Tanya Hall 15924
Mark Holloway 14783.33333
Susan Johnson 15360.20408
Robert Pickering 17142.1875
Michelle Randall 17142.1875
John Rogers 6914.197531
John Smith 4282.5
Mary Taylor 2450
Steven Webber 5397.569444
Jarrad Winston 7079.585799

 

Examples of the Oracle VAR_SAMP Function

Now, let’s have a look at some examples of the VAR_SAMP function, both as an analytic function and an aggregate function.

Example 1 – Aggregate Function

This example shows how to use VAR_SAMP as an aggregate function

SELECT VAR_SAMP(fees_required) AS var_samp_test
FROM student;

Result:

VAR_SAMP_TEST
71883.51648

 

Example 2 – Analytic Function

This example shows how to use VAR_SAMP as an analytic function.

SELECT first_name,
last_name,
address_state,
VAR_SAMP(fees_required) OVER (PARTITION BY address_state) AS var_samp_test
FROM student;

Result:

FIRST_NAME LAST_NAME ADDRESS_STATE VAR_SAMP_TEST
Mark Anderson California (null)
Susan Johnson Colorado 800
Robert Pickering Colorado 800
Michelle Randall Florida (null)
Tom Capper Nevada 9800
John Rogers Nevada 9800
John Smith New York 53333.33333
Mark Holloway New York 53333.33333
Steven Webber New York 53333.33333
Mary Taylor Oregon (null)
Andrew Cooper Texas 152500
Tanya Hall Texas 152500
Julie Armstrong Texas 152500
Jarrad Winston Utah (null)

 

Examples of the Oracle COVAR_SAMP Function

Example 1 – Aggregate Function

Here’s an example of the COVAR_SAMP function.

SELECT COVAR_SAMP(fees_paid, fees_required) AS cvsamp
FROM student;

Result:

CVSAMP
7669.551282

 

Example 2 – Analytic Function

Here’s an example of using the COVAR_SAMP as an analytic function.

SELECT first_name,
last_name,
COVAR_SAMP(fees_paid, fees_required) OVER (ORDER BY first_name) AS cvsamp
FROM student;

Result:

FIRST_NAME LAST_NAME CVSAMP
Andrew Cooper
Jarrad Winston 5000
John Smith -33916.66667
John Rogers -33916.66667
Julie Armstrong 8500
Mark Anderson -9046.428571
Mark Holloway -9046.428571
Mary Taylor -7209.821429
Michelle Randall -7209.821429
Robert Pickering 859.7222222
Steven Webber 6752.222222
Susan Johnson 8001.363636
Tanya Hall 8731.818182
Tom Capper 7669.551282

 

If you want to know more about SQL functions, you can find a full list of Oracle SQL functions here.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!

1 thought on “Oracle Variance Functions Guide, FAQ, and Examples”

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.