Oracle Variance Functions Guide, FAQ, and Examples

Dec 26, 2017 · 12 min read · sql-functions ·

In this article, we’ll look at a range of Variance functions (VARIANCE, CUME_DIST, CORR, VAR_POP, COVAR_POP, VAR_SAMP, and COVAR_SAMP), explain what they do, and see some examples.

These functions can be used as an window function or an aggregate function. I’ll explain both formats in this article.

What is a Variance?

Before we look at the functions, let’s understand what a variance is.

A variance is a mathematical concept, which represents how different the values are in a set of values (such as a column of data).

That link can probably explain it better than I can!

Purpose of the Oracle Variance Functions

There are quite a few variance functions in Oracle. Let’s take a look at what they all do.

VARIANCE

The purpose of the VARIANCE function is to, well, return the variance of a set of numbers.

CUME_DIST

The purpose of the CUME_DIST function is to find the cumulative distribution of a value in a list of values.

The value that is returned will be between 0 and 1.

CORR

The Oracle CORR function returns the “coefficient of correlation” of a set of number pairs.

What is a coefficient of correlation? Here’s an definition of what it means.

COVAR_POP

The Oracle COVAR_POP function calculates the population covariance of a set of number pairs.

What is a “population covariance”? Good question. Here’s an article that explains what the definition is.

VAR_SAMP

The VAR_SAMP function will return the sample variance of a set of numbers, after excluding NULLs.

What does this mean?

The sample variation calculates the variation of a sample of numbers from the overall set. The higher the result, the larger the variation between the numbers.

COVAR_SAMP

This function returns the covariance of a set of number pairs.

What does that mean? Here’s an explanation of what a covariance is.

VAR_POP

The VAR_POP function finds the population variance of a set of numbers, excluding NULLs.

What does this mean?

The population variance is a calculation that determines the difference or variation in a complete set of numbers. The higher the number is, the bigger the variance.

[thrive_leads id='10502']

Oracle VARIANCE Function Syntax and Parameters

Aggregate Function

The syntax of the VARIANCE function when used as an aggregate function is:

1VARIANCE ( [DISTINCT | ALL] expression)

The parameters of the VARIANCE aggregate function are:

DISTINCT or ALL (optional): If you specify ALL, then all values are considered for the variance. If you specify DISTINCT, only the distinct values are considered in the variance (duplicates are ignored).
expression (mandatory): This is the expression (such as a column) to be analysed for variance.

The data type can be any numeric data type, and the return value is the same data type that was provided to it.

Analytic Function

If you’re using the VARIANCE function as an analytic function, then the syntax is:

1VARIANCE ( [DISTINCT | ALL] expression) OVER (analytic_clause)

The parameters of VARIANCE when using it as an analytic function are:

DISTINCT or ALL (optional): If you specify ALL, then all values are considered for the variance. If you specify DISTINCT, only the distinct values are considered in the variance (duplicates are ignored).
expression (mandatory): This is the expression (such as a column) to be analysed for variance.
analytic_clause (mandatory): This is where you specify a query partition clause, order by clause,, and windowing clause.

If you specify DISTINCT, then you are only able to specify the query partition clause inside the analytic clause. The order by clause and windowing clause are not allowed.

Oracle CUME_DIST Syntax and Parameters

Aggregate Function

The syntax for this function is different depending on how you use it.

As an aggregate function, it looks like this:

1CUME_DIST (expression1, … expression_n) WITHIN GROUP (ORDER BY expression_order1, … expression_order_n)

The parameters of the CUME_DIST function as an aggregate function are:

expression1 (mandatory): The list of expressions to evaluate a distribution for.
expression_order1 (mandatory): The list of expressions that contain the larger list of values. Expression1 is checked within expression_order1.

You can have more than one expression in each set of parameters, separated by a comma. However, the data types need to match between them.

So, the data type of expression1 needs to match expression_order1, and the data type of expression2 needs to match expression_order2, and so on.

Analytic Function

As an analytic function, it looks like this:

1CUME_DIST() OVER ( [query_partition_clause] ORDER BY order_clause )

When it is run as an analytic function, it performs the cumulative distribution for the row’s value against a range of values. The result is likely different for each row.

The parameters for the analytic version of CUME_DIST are:

query_partition_clause (optional): You can specify how you want your CUME_DIST value grouped. If this is not specified, the function looks at all values.
order_by_clause (mandatory): This is the column used to calculate the cumulative distribution.

Oracle CORR Function Syntax and Parameters

Aggregate Function

The syntax for CORR as an aggregate function is:

1CORR (expression1, expression2)

The parameters of the CORR aggregate function are:

expression1 (mandatory): The first set of data to use in the coefficient calculation.
expression2 (mandatory): The second set of data to use in the coefficient calculation.

Both of these fields are mandatory because the calculation needs two sets of data.

Analytic Function

The syntax of the CORR function as an analytic function is:

1CORR (expression1, expression2) OVER (analytic_clause)

Just like the aggregate function, both expressions are required. The analytic_clause is new.

The analytic_clause allows you to specify the query partition clause, the order by clause, and the windowing clause.

Oracle VAR_POP Function Syntax and Parameters

The syntax of the VAR_POP function is:

1VAR_POP ( expression ) [OVER (analytic_clause)]

Parameters

expression (mandatory): This is the expression, such as a set of data or a column, to perform the population variance calculation on.
analytic_clause (optional): This is the clause that determines how the analytic function behaves, if using this as an analytic function.

The expression can be any numeric data type, and returns the same data type as the expression parameter.

Oracle COVAR_POP Function Syntax and Parameters

Aggregate Function

The syntax of the COVAR_POP aggregate function is:

1COVAR_POP ( expression1, expression2 )

The parameters of the COVAR_POP function are:

expression1 (mandatory): The first set of numbers to use in the calculation.
expression2 (mandatory): The second set of numbers to use in the calculation.

If there is a pair of numbers that are both set to NULL, then the function does not consider them.

Analytic Function

When you use COVAR_POP as an analytic function, the syntax is:

1COVAR_POP (expression1, expression2) OVER (analytic_clause)

The analytic_clause is added here, which is common for analytic queries.

The parameters of the COVAR_POP analytic function are:

expression1 (mandatory): The first set of numbers to use in the calculation.
expression2 (mandatory): The second set of numbers to use in the calculation.
analytic_clause (mandatory): This defines how the analytic function operates. It includes a query partition clause, order by clause, and a windowing clause.

Oracle VAR_SAMP Function Syntax and Parameters

The syntax of the VAR_SAMP function is:

1VAR_SAMP ( expression ) [OVER (analytic_clause)]

The parameters of the VAR_SAMP function are:

expression (mandatory): This is the expression, such as a set of data or a column, to get the sample variance for.
analytic_clause (optional): This clause determines how the analytic function behaves.

The expression can be any numeric data type, and returns the same data type as the expression parameter.

Oracle COVAR_SAMP Function Syntax and Parameters

Aggregate Function

When using COVAR_SAMP as an aggregate function, the syntax is:

1COVAR_SAMP (expression1, expression2)

The parameters of the COVAR_SAMP function are:

expression1 (mandatory): The first set of numbers to use in the calculation.
expression2 (mandatory): The second set of numbers to use in the calculation.

These two parameters are a numeric data type, and return the same data type as the parameters.

Analytic Function

If you want to use the COVAR_SAMP function as an analytic function, the syntax is:

1COVAR_SAMP (expression1, expression2) OVER (analytic_clause)

The only difference here is the addition of the analytic_clause, which is quite common for analytic queries.

The parameters of the COVAR_SAMP analytic function are:

expression1 (mandatory): The first set of numbers to use in the calculation.
expression2 (mandatory): The second set of numbers to use in the calculation.
analytic_clause (mandatory): This defines how the analytic function operates. It includes a query partition clause, order by clause, and a windowing clause.

Examples of the Oracle VARIANCE Function

Example 1 – simple aggregate

This is an example of the VARIANCE function with a simple set of parameters.

1SELECT VARIANCE(fees_paid)
2FROM student;

Results:

VARIANCE(FEES_PAID)
38910.25641

Example 2 – aggregate using DISTINCT

This example uses the DISTINCT keyword to eliminate duplicates in the variance calculation.

1SELECT VARIANCE(DISTINCT fees_paid) AS variance_test
2FROM student;

Results:

VARIANCE_TEST
47102.5

Example 3 - analytic function

1SELECT first_name, last_name, fees_paid,
2VARIANCE(fees_paid) OVER (ORDER BY enrolment_date) AS variance_value
3FROM student;

Result:

FIRST_NAME	LAST_NAME	FEES_PAID	VARIANCE_VALUE
Susan	Johnson	150	0
Mark	Holloway	410	33800
Michelle	Randall	(null)	33800
Tanya	Hall	150	22533.33333
Robert	Pickering	100	19691.66667
John	Smith	100	16870
Julie	Armstrong	0	19016.66667
Andrew	Cooper	400	24657.14286
Tom	Capper	320	23341.07143
Steven	Webber	80	22125
John	Rogers	700	38910.25641
Jarrad	Winston	300	38910.25641
Mary	Taylor	100	38910.25641
Mark	Anderson	45	38910.25641

This shows the variance for the fees_paid column.

Examples of the Oracle CUME_DIST Function

Let’s take a look at some examples of the CUME_DIST function.

Example 1 – single parameter as aggregate

This example uses a single parameter for CUME_DIST.

1SELECT
2CUME_DIST(300) WITHIN GROUP (ORDER BY fees_paid) AS cume_dist_test
3FROM student;

Result:

CUME_DIST_TEST
0.5882352941

This shows the cumulative distribution of a student who has a fees_paid value of 300 within the entire table.

Example 2 – multiple parameters as aggregate

This example uses multiple parameters for CUME_DIST.

1SELECT
2CUME_DIST(300, 200) WITHIN GROUP (ORDER BY fees_paid, fees_required) AS cume_dist_test
3FROM student;

Result:

CUME_DIST_TEST
0.5294117647

This shows the cumulative distribution of a student who has a fees_paid value of 300 and a fees_required value of 200 within the entire table.

Example 3 – analytic, no partition

This example has no query partition clause.

1SELECT first_name, last_name, fees_paid,
2CUME_DIST() OVER (ORDER BY fees_paid) AS cume_dist_test
3FROM student;

Result:

FIRST_NAME	LAST_NAME	FEES_PAID	CUME_DIST_TEST
Julie	Armstrong	0	0.07142857143
Mark	Anderson	45	0.1428571429
Steven	Webber	80	0.2142857143
John	Smith	100	0.4285714286
Robert	Pickering	100	0.4285714286
Mary	Taylor	100	0.4285714286
Susan	Johnson	150	0.5714285714
Tanya	Hall	150	0.5714285714
Jarrad	Winston	300	0.6428571429
Tom	Capper	320	0.7142857143
Andrew	Cooper	400	0.7857142857
Mark	Holloway	410	0.8571428571
John	Rogers	700	0.9285714286
Michelle	Randall	(null)	1

The CUME_DIST column shows the cumulative distribution of each row’s fees_paid value, with no partitioning.

Example 4 – analytic with partition

This example is similar, but includes a query partition clause.

1SELECT first_name, last_name, fees_paid,
2CUME_DIST() OVER (PARTITION BY address_state ORDER BY fees_paid) AS cume_dist_test
3FROM student;

Result:

FIRST_NAME	LAST_NAME	FEES_PAID	CUME_DIST_TEST
Mark	Anderson	45	1
Robert	Pickering	100	0.5
Susan	Johnson	150	1
Michelle	Randall		1
Tom	Capper	320	0.5
John	Rogers	700	1
Steven	Webber	80	0.3333333333
John	Smith	100	0.6666666667
Mark	Holloway	410	1
Mary	Taylor	100	1
Julie	Armstrong	0	0.3333333333
Tanya	Hall	150	0.6666666667
Andrew	Cooper	400	1
Jarrad	Winston	300	1

This example also shows the cumulative distribution of the fees_paid column for each row, but it is grouped into the address_state values.

Examples of the Oracle CORR Function

Here are some examples of the CORR function.

Example 1 - aggregate function

1SELECT CORR(fees_paid, fees_required) AS corr_test
2FROM student;

Result:

CORR_TEST
0.140645438

Example 2 – analytic function

This example uses CORR as an analytic function.

1SELECT first_name, last_name,
2CORR(fees_paid, fees_required) OVER (ORDER BY first_name) AS corr_value
3FROM student;

Result:

FIRST_NAME	LAST_NAME	CORR_VALUE
Andrew	Cooper	(null)
Jarrad	Winston	1
John	Smith	-0.5214414501
John	Rogers	-0.5214414501
Julie	Armstrong	0.102484048
Mark	Anderson	-0.1255519091
Mark	Holloway	-0.1255519091
Mary	Taylor	-0.112526085
Michelle	Randall	-0.112526085
Robert	Pickering	0.01309498303
Steven	Webber	0.1030915728
Susan	Johnson	0.1288908087
Tanya	Hall	0.1486480032
Tom	Capper	0.140645438

Examples of the Oracle VAR_POP Function

Here are some examples of the VAR_POP function, both as an analytic function and an aggregate function.

Example 1 – Aggregate Function

This example shows how to use VAR_POP as an aggregate function

1SELECT
2VAR_POP(fees_required) AS var_pop_test
3FROM student;

Result:

VAR_POP_TEST
66748.97959

Example 2 – Analytic Function

This example shows how to use VAR_POP as an analytic function.

1SELECT first_name, last_name, address_state,
2VAR_POP(fees_required) OVER (PARTITION BY address_state) AS var_pop_test
3FROM student;

Result:

FIRST_NAME	LAST_NAME	ADDRESS_STATE	VAR_POP_TEST
Mark	Anderson	California	0
Susan	Johnson	Colorado	400
Robert	Pickering	Colorado	400
Michelle	Randall	Florida	0
Tom	Capper	Nevada	4900
John	Rogers	Nevada	4900
John	Smith	New York	35555.55556
Mark	Holloway	New York	35555.55556
Steven	Webber	New York	35555.55556
Mary	Taylor	Oregon	0
Andrew	Cooper	Texas	101666.6667
Tanya	Hall	Texas	101666.6667
Julie	Armstrong	Texas	101666.6667
Jarrad	Winston	Utah	0

Examples of the Oracle COVAR_POP Function

Example 1 - aggregate

Here is an example of the COVAR_POP function when used as an aggregate.

1SELECT COVAR_POP(fees_paid, fees_required) AS covarpop_value
2FROM student;

Result:

COVARPOP_VALUE
7079.585799

Example 2 - analytic function

This is an example of the COVAR_POP function as an analytic function.

1SELECT first_name,
2last_name,
3COVAR_POP(fees_paid, fees_required) OVER (ORDER BY last_name) AS covar_pop_value
4FROM student;

Result:

FIRST_NAME	LAST_NAME	COVAR_POP_VALUE
Mark	Anderson	0
Julie	Armstrong	8550
Tom	Capper	-2894.444444
Andrew	Cooper	16790.625
Tanya	Hall	15924
Mark	Holloway	14783.33333
Susan	Johnson	15360.20408
Robert	Pickering	17142.1875
Michelle	Randall	17142.1875
John	Rogers	6914.197531
John	Smith	4282.5
Mary	Taylor	2450
Steven	Webber	5397.569444
Jarrad	Winston	7079.585799

Examples of the Oracle VAR_SAMP Function

Now, let’s have a look at some examples of the VAR_SAMP function, both as an analytic function and an aggregate function.

Example 1 – Aggregate Function

This example shows how to use VAR_SAMP as an aggregate function

1SELECT VAR_SAMP(fees_required) AS var_samp_test
2FROM student;

Result:

VAR_SAMP_TEST
71883.51648

Example 2 – Analytic Function

This example shows how to use VAR_SAMP as an analytic function.

1SELECT first_name,
2last_name,
3address_state,
4VAR_SAMP(fees_required) OVER (PARTITION BY address_state) AS var_samp_test
5FROM student;

Result:

FIRST_NAME	LAST_NAME	ADDRESS_STATE	VAR_SAMP_TEST
Mark	Anderson	California	(null)
Susan	Johnson	Colorado	800
Robert	Pickering	Colorado	800
Michelle	Randall	Florida	(null)
Tom	Capper	Nevada	9800
John	Rogers	Nevada	9800
John	Smith	New York	53333.33333
Mark	Holloway	New York	53333.33333
Steven	Webber	New York	53333.33333
Mary	Taylor	Oregon	(null)
Andrew	Cooper	Texas	152500
Tanya	Hall	Texas	152500
Julie	Armstrong	Texas	152500
Jarrad	Winston	Utah	(null)

Examples of the Oracle COVAR_SAMP Function

Example 1 – Aggregate Function

Here’s an example of the COVAR_SAMP function.

1SELECT COVAR_SAMP(fees_paid, fees_required) AS cvsamp
2FROM student;

Result:

CVSAMP
7669.551282

Example 2 – Analytic Function

Here’s an example of using the COVAR_SAMP as an analytic function.

1SELECT first_name,
2last_name,
3COVAR_SAMP(fees_paid, fees_required) OVER (ORDER BY first_name) AS cvsamp
4FROM student;

Result:

FIRST_NAME	LAST_NAME	CVSAMP
Andrew	Cooper
Jarrad	Winston	5000
John	Smith	-33916.66667
John	Rogers	-33916.66667
Julie	Armstrong	8500
Mark	Anderson	-9046.428571
Mark	Holloway	-9046.428571
Mary	Taylor	-7209.821429
Michelle	Randall	-7209.821429
Robert	Pickering	859.7222222
Steven	Webber	6752.222222
Susan	Johnson	8001.363636
Tanya	Hall	8731.818182
Tom	Capper	7669.551282

If you want to know more about SQL functions, you can find a full list of Oracle SQL functions here.