In this article, we’ll look at a range of Variance functions (VARIANCE, CUME_DIST, CORR, VAR_POP, COVAR_POP, VAR_SAMP, and COVAR_SAMP), explain what they do, and see some examples.

These functions can be used as an analytic function or an aggregate function. I’ll explain both formats in this article.

## What is a Variance?

Before we look at the functions, let’s understand what a variance is.

A variance is a mathematical concept, which represents how different the values are in a set of values (such as a column of data).

That link can probably explain it better than I can!

## Purpose of the Oracle Variance Functions

There are quite a few variance functions in Oracle. Let’s take a look at what they all do.

**VARIANCE**

The purpose of the VARIANCE function is to, well, return the variance of a set of numbers.

**CUME_DIST**

The purpose of the CUME_DIST function is to find the cumulative distribution of a value in a list of values.

The value that is returned will be between 0 and 1.

**CORR**

The Oracle CORR function returns the “coefficient of correlation” of a set of number pairs.

What is a coefficient of correlation? Here’s an definition of what it means.

**COVAR_POP**

The Oracle COVAR_POP function calculates the population covariance of a set of number pairs.

What is a “population covariance”? Good question. Here’s an article that explains what the definition is.

**VAR_SAMP**

The VAR_SAMP function will return the sample variance of a set of numbers, after excluding NULLs.

What does this mean?

The sample variation calculates the variation of a sample of numbers from the overall set. The higher the result, the larger the variation between the numbers.

**COVAR_SAMP**

This function returns the covariance of a set of number pairs.

What does that mean? Here’s an explanation of what a covariance is.

**VAR_POP**

The VAR_POP function finds the population variance of a set of numbers, excluding NULLs.

What does this mean?

The population variance is a calculation that determines the difference or variation in a complete set of numbers. The higher the number is, the bigger the variance.

## Oracle VARIANCE Function Syntax and Parameters

### Aggregate Function

The syntax of the VARIANCE function when used as an aggregate function is:

VARIANCE ( [DISTINCT | ALL] expression)

The parameters of the VARIANCE aggregate function are:

- DISTINCT or ALL (optional): If you specify ALL, then all values are considered for the variance. If you specify DISTINCT, only the distinct values are considered in the variance (duplicates are ignored).
- expression (mandatory): This is the expression (such as a column) to be analysed for variance.

The data type can be any numeric data type, and the return value is the same data type that was provided to it.

### Analytic Function

If you’re using the VARIANCE function as an analytic function, then the syntax is:

VARIANCE ( [DISTINCT | ALL] expression) OVER (analytic_clause)

The parameters of VARIANCE when using it as an analytic function are:

- DISTINCT or ALL (optional): If you specify ALL, then all values are considered for the variance. If you specify DISTINCT, only the distinct values are considered in the variance (duplicates are ignored).
- expression (mandatory): This is the expression (such as a column) to be analysed for variance.
- analytic_clause (mandatory): This is where you specify a query partition clause, order by clause,, and windowing clause.

If you specify DISTINCT, then you are only able to specify the query partition clause inside the analytic clause. The order by clause and windowing clause are not allowed.

## Oracle CUME_DIST Syntax and Parameters

### Aggregate Function

The syntax for this function is different depending on how you use it.

As an aggregate function, it looks like this:

CUME_DIST (expression1, … expression_n) WITHIN GROUP (ORDER BY expression_order1, … expression_order_n)

The parameters of the CUME_DIST function as an aggregate function are:

- expression1 (mandatory): The list of expressions to evaluate a distribution for.
- expression_order1 (mandatory): The list of expressions that contain the larger list of values. Expression1 is checked within expression_order1.

You can have more than one expression in each set of parameters, separated by a comma. However, the data types need to match between them.

So, the data type of expression1 needs to match expression_order1, and the data type of expression2 needs to match expression_order2, and so on.

### Analytic Function

As an analytic function, it looks like this:

CUME_DIST() OVER ( [query_partition_clause] ORDER BY order_clause )

When it is run as an analytic function, it performs the cumulative distribution for the row’s value against a range of values. The result is likely different for each row.

The parameters for the analytic version of CUME_DIST are:

- query_partition_clause (optional): You can specify how you want your CUME_DIST value grouped. If this is not specified, the function looks at all values.
- order_by_clause (mandatory): This is the column used to calculate the cumulative distribution.

## Oracle CORR Function Syntax and Parameters

### Aggregate Function

The syntax for CORR as an aggregate function is:

CORR (expression1, expression2)

The parameters of the CORR aggregate function are:

- expression1 (mandatory): The first set of data to use in the coefficient calculation.
- expression2 (mandatory): The second set of data to use in the coefficient calculation.

Both of these fields are mandatory because the calculation needs two sets of data.

### Analytic Function

The syntax of the CORR function as an analytic function is:

CORR (expression1, expression2) OVER (analytic_clause)

Just like the aggregate function, both expressions are required. The analytic_clause is new.

The analytic_clause allows you to specify the query partition clause, the order by clause, and the windowing clause.

## Oracle VAR_POP Function Syntax and Parameters

The syntax of the VAR_POP function is:

VAR_POP ( expression ) [OVER (analytic_clause)]

Parameters

- expression (mandatory): This is the expression, such as a set of data or a column, to perform the population variance calculation on.
- analytic_clause (optional): This is the clause that determines how the analytic function behaves, if using this as an analytic function.

The expression can be any numeric data type, and returns the same data type as the expression parameter.

## Oracle COVAR_POP Function Syntax and Parameters

### Aggregate Function

The syntax of the COVAR_POP aggregate function is:

COVAR_POP ( expression1, expression2 )

The parameters of the COVAR_POP function are:

- expression1 (mandatory): The first set of numbers to use in the calculation.
- expression2 (mandatory): The second set of numbers to use in the calculation.

If there is a pair of numbers that are both set to NULL, then the function does not consider them.

### Analytic Function

When you use COVAR_POP as an analytic function, the syntax is:

COVAR_POP (expression1, expression2) OVER (analytic_clause)

The analytic_clause is added here, which is common for analytic queries.

The parameters of the COVAR_POP analytic function are:

- expression1 (mandatory): The first set of numbers to use in the calculation.
- expression2 (mandatory): The second set of numbers to use in the calculation.
- analytic_clause (mandatory): This defines how the analytic function operates. It includes a query partition clause, order by clause, and a windowing clause.

## Oracle VAR_SAMP Function Syntax and Parameters

The syntax of the VAR_SAMP function is:

VAR_SAMP ( expression ) [OVER (analytic_clause)]

The parameters of the VAR_SAMP function are:

- expression (mandatory): This is the expression, such as a set of data or a column, to get the sample variance for.
- analytic_clause (optional): This clause determines how the analytic function behaves.

The expression can be any numeric data type, and returns the same data type as the expression parameter.

## Oracle COVAR_SAMP Function Syntax and Parameters

### Aggregate Function

When using COVAR_SAMP as an aggregate function, the syntax is:

COVAR_SAMP (expression1, expression2)

The parameters of the COVAR_SAMP function are:

- expression1 (mandatory): The first set of numbers to use in the calculation.
- expression2 (mandatory): The second set of numbers to use in the calculation.

These two parameters are a numeric data type, and return the same data type as the parameters.

### Analytic Function

If you want to use the COVAR_SAMP function as an analytic function, the syntax is:

COVAR_SAMP (expression1, expression2) OVER (analytic_clause)

The only difference here is the addition of the analytic_clause, which is quite common for analytic queries.

The parameters of the COVAR_SAMP analytic function are:

- expression1 (mandatory): The first set of numbers to use in the calculation.
- expression2 (mandatory): The second set of numbers to use in the calculation.
- analytic_clause (mandatory): This defines how the analytic function operates. It includes a query partition clause, order by clause, and a windowing clause.

## Examples of the Oracle VARIANCE Function

### Example 1 – simple aggregate

This is an example of the VARIANCE function with a simple set of parameters.

```
SELECT VARIANCE(fees_paid)
FROM student;
```

Results:

VARIANCE(FEES_PAID) |

38910.25641 |

### Example 2 – aggregate using DISTINCT

This example uses the DISTINCT keyword to eliminate duplicates in the variance calculation.

```
SELECT VARIANCE(DISTINCT fees_paid) AS variance_test
FROM student;
```

Results:

VARIANCE_TEST |

47102.5 |

### Example 3 – analytic function

```
SELECT first_name, last_name, fees_paid,
VARIANCE(fees_paid) OVER (ORDER BY enrolment_date) AS variance_value
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
FEES_PAID |
VARIANCE_VALUE |

Susan | Johnson | 150 | 0 |

Mark | Holloway | 410 | 33800 |

Michelle | Randall | (null) | 33800 |

Tanya | Hall | 150 | 22533.33333 |

Robert | Pickering | 100 | 19691.66667 |

John | Smith | 100 | 16870 |

Julie | Armstrong | 0 | 19016.66667 |

Andrew | Cooper | 400 | 24657.14286 |

Tom | Capper | 320 | 23341.07143 |

Steven | Webber | 80 | 22125 |

John | Rogers | 700 | 38910.25641 |

Jarrad | Winston | 300 | 38910.25641 |

Mary | Taylor | 100 | 38910.25641 |

Mark | Anderson | 45 | 38910.25641 |

This shows the variance for the fees_paid column.

## Examples of the Oracle CUME_DIST Function

Let’s take a look at some examples of the CUME_DIST function.

### Example 1 – single parameter as aggregate

This example uses a single parameter for CUME_DIST.

```
SELECT
CUME_DIST(300) WITHIN GROUP (ORDER BY fees_paid) AS cume_dist_test
FROM student;
```

Result:

CUME_DIST_TEST |

0.5882352941 |

This shows the cumulative distribution of a student who has a fees_paid value of 300 within the entire table.

### Example 2 – multiple parameters as aggregate

This example uses multiple parameters for CUME_DIST.

```
SELECT
CUME_DIST(300, 200) WITHIN GROUP (ORDER BY fees_paid, fees_required) AS cume_dist_test
FROM student;
```

Result:

CUME_DIST_TEST |

0.5294117647 |

This shows the cumulative distribution of a student who has a fees_paid value of 300 and a fees_required value of 200 within the entire table.

### Example 3 – analytic, no partition

This example has no query partition clause.

```
SELECT first_name, last_name, fees_paid,
CUME_DIST() OVER (ORDER BY fees_paid) AS cume_dist_test
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
FEES_PAID |
CUME_DIST_TEST |

Julie | Armstrong | 0 | 0.07142857143 |

Mark | Anderson | 45 | 0.1428571429 |

Steven | Webber | 80 | 0.2142857143 |

John | Smith | 100 | 0.4285714286 |

Robert | Pickering | 100 | 0.4285714286 |

Mary | Taylor | 100 | 0.4285714286 |

Susan | Johnson | 150 | 0.5714285714 |

Tanya | Hall | 150 | 0.5714285714 |

Jarrad | Winston | 300 | 0.6428571429 |

Tom | Capper | 320 | 0.7142857143 |

Andrew | Cooper | 400 | 0.7857142857 |

Mark | Holloway | 410 | 0.8571428571 |

John | Rogers | 700 | 0.9285714286 |

Michelle | Randall | (null) | 1 |

The CUME_DIST column shows the cumulative distribution of each row’s fees_paid value, with no partitioning.

### Example 4 – analytic with partition

This example is similar, but includes a query partition clause.

```
SELECT first_name, last_name, fees_paid,
CUME_DIST() OVER (PARTITION BY address_state ORDER BY fees_paid) AS cume_dist_test
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
FEES_PAID |
CUME_DIST_TEST |

Mark | Anderson | 45 | 1 |

Robert | Pickering | 100 | 0.5 |

Susan | Johnson | 150 | 1 |

Michelle | Randall | 1 | |

Tom | Capper | 320 | 0.5 |

John | Rogers | 700 | 1 |

Steven | Webber | 80 | 0.3333333333 |

John | Smith | 100 | 0.6666666667 |

Mark | Holloway | 410 | 1 |

Mary | Taylor | 100 | 1 |

Julie | Armstrong | 0 | 0.3333333333 |

Tanya | Hall | 150 | 0.6666666667 |

Andrew | Cooper | 400 | 1 |

Jarrad | Winston | 300 | 1 |

This example also shows the cumulative distribution of the fees_paid column for each row, but it is grouped into the address_state values.

## Examples of the Oracle CORR Function

Here are some examples of the CORR function.

### Example 1 – aggregate function

```
SELECT CORR(fees_paid, fees_required) AS corr_test
FROM student;
```

Result:

CORR_TEST |

0.140645438 |

### Example 2 – analytic function

This example uses CORR as an analytic function.

```
SELECT first_name, last_name,
CORR(fees_paid, fees_required) OVER (ORDER BY first_name) AS corr_value
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
CORR_VALUE |

Andrew | Cooper | (null) |

Jarrad | Winston | 1 |

John | Smith | -0.5214414501 |

John | Rogers | -0.5214414501 |

Julie | Armstrong | 0.102484048 |

Mark | Anderson | -0.1255519091 |

Mark | Holloway | -0.1255519091 |

Mary | Taylor | -0.112526085 |

Michelle | Randall | -0.112526085 |

Robert | Pickering | 0.01309498303 |

Steven | Webber | 0.1030915728 |

Susan | Johnson | 0.1288908087 |

Tanya | Hall | 0.1486480032 |

Tom | Capper | 0.140645438 |

## Examples of the Oracle VAR_POP Function

Here are some examples of the VAR_POP function, both as an analytic function and an aggregate function.

### Example 1 – Aggregate Function

This example shows how to use VAR_POP as an aggregate function

```
SELECT
VAR_POP(fees_required) AS var_pop_test
FROM student;
```

Result:

VAR_POP_TEST |

66748.97959 |

### Example 2 – Analytic Function

This example shows how to use VAR_POP as an analytic function.

```
SELECT first_name, last_name, address_state,
VAR_POP(fees_required) OVER (PARTITION BY address_state) AS var_pop_test
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
ADDRESS_STATE |
VAR_POP_TEST |

Mark | Anderson | California | 0 |

Susan | Johnson | Colorado | 400 |

Robert | Pickering | Colorado | 400 |

Michelle | Randall | Florida | 0 |

Tom | Capper | Nevada | 4900 |

John | Rogers | Nevada | 4900 |

John | Smith | New York | 35555.55556 |

Mark | Holloway | New York | 35555.55556 |

Steven | Webber | New York | 35555.55556 |

Mary | Taylor | Oregon | 0 |

Andrew | Cooper | Texas | 101666.6667 |

Tanya | Hall | Texas | 101666.6667 |

Julie | Armstrong | Texas | 101666.6667 |

Jarrad | Winston | Utah | 0 |

## Examples of the Oracle COVAR_POP Function

### Example 1 – aggregate

Here is an example of the COVAR_POP function when used as an aggregate.

```
SELECT COVAR_POP(fees_paid, fees_required) AS covarpop_value
FROM student;
```

Result:

COVARPOP_VALUE |

7079.585799 |

### Example 2 – analytic function

This is an example of the COVAR_POP function as an analytic function.

```
SELECT first_name,
last_name,
COVAR_POP(fees_paid, fees_required) OVER (ORDER BY last_name) AS covar_pop_value
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
COVAR_POP_VALUE |

Mark | Anderson | 0 |

Julie | Armstrong | 8550 |

Tom | Capper | -2894.444444 |

Andrew | Cooper | 16790.625 |

Tanya | Hall | 15924 |

Mark | Holloway | 14783.33333 |

Susan | Johnson | 15360.20408 |

Robert | Pickering | 17142.1875 |

Michelle | Randall | 17142.1875 |

John | Rogers | 6914.197531 |

John | Smith | 4282.5 |

Mary | Taylor | 2450 |

Steven | Webber | 5397.569444 |

Jarrad | Winston | 7079.585799 |

## Examples of the Oracle VAR_SAMP Function

Now, let’s have a look at some examples of the VAR_SAMP function, both as an analytic function and an aggregate function.

### Example 1 – Aggregate Function

This example shows how to use VAR_SAMP as an aggregate function

```
SELECT VAR_SAMP(fees_required) AS var_samp_test
FROM student;
```

Result:

VAR_SAMP_TEST |

71883.51648 |

### Example 2 – Analytic Function

This example shows how to use VAR_SAMP as an analytic function.

```
SELECT first_name,
last_name,
address_state,
VAR_SAMP(fees_required) OVER (PARTITION BY address_state) AS var_samp_test
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
ADDRESS_STATE |
VAR_SAMP_TEST |

Mark | Anderson | California | (null) |

Susan | Johnson | Colorado | 800 |

Robert | Pickering | Colorado | 800 |

Michelle | Randall | Florida | (null) |

Tom | Capper | Nevada | 9800 |

John | Rogers | Nevada | 9800 |

John | Smith | New York | 53333.33333 |

Mark | Holloway | New York | 53333.33333 |

Steven | Webber | New York | 53333.33333 |

Mary | Taylor | Oregon | (null) |

Andrew | Cooper | Texas | 152500 |

Tanya | Hall | Texas | 152500 |

Julie | Armstrong | Texas | 152500 |

Jarrad | Winston | Utah | (null) |

## Examples of the Oracle COVAR_SAMP Function

### Example 1 – Aggregate Function

Here’s an example of the COVAR_SAMP function.

```
SELECT COVAR_SAMP(fees_paid, fees_required) AS cvsamp
FROM student;
```

Result:

CVSAMP |

7669.551282 |

### Example 2 – Analytic Function

Here’s an example of using the COVAR_SAMP as an analytic function.

```
SELECT first_name,
last_name,
COVAR_SAMP(fees_paid, fees_required) OVER (ORDER BY first_name) AS cvsamp
FROM student;
```

Result:

FIRST_NAME |
LAST_NAME |
CVSAMP |

Andrew | Cooper | |

Jarrad | Winston | 5000 |

John | Smith | -33916.66667 |

John | Rogers | -33916.66667 |

Julie | Armstrong | 8500 |

Mark | Anderson | -9046.428571 |

Mark | Holloway | -9046.428571 |

Mary | Taylor | -7209.821429 |

Michelle | Randall | -7209.821429 |

Robert | Pickering | 859.7222222 |

Steven | Webber | 6752.222222 |

Susan | Johnson | 8001.363636 |

Tanya | Hall | 8731.818182 |

Tom | Capper | 7669.551282 |

If you want to know more about SQL functions, you can find a full list of Oracle SQL functions here.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!

piyaVery interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.

Correlation vs Covariance

Simple linear regression

data science interview questions