How to Find the Row That Has the Maximum Value for a Column in Oracle

Jul 25, 2016 · 7 min read · sql ·

Do you need to find the data for a row that has the maximum value for a specific column in Oracle? Learn how to do it and see some alternative answers in this article.

The Table Structure

Let's say you have a table called sales_volume that looks like this:

CITY	START_DATE	SALES
El Paso	27/May/15	4493
Wilmington	16/Nov/15	4656
Richmond	11/May/15	3739
Irvine	23/Jan/15	1356
Berkeley	18/May/15	263
Pasadena	20/Oct/15	969
Chandler	30/Nov/15	3679
Montgomery	03/Oct/15	4744
Miramar	23/May/15	3754
Elk Grove	22/Jul/15	1684

You can see there is a range of records here with different data.

The Problem - Select the Row That Has the Maximum Value

Now, given you have the table above (or something like it), how could you do the following:

For each different city, find the rows that match the highest start_date value for each city.

So, I would expect an output like this:

CITY	START_DATE	SALES
Abilene	31/Dec/15	2611
Akron	31/Dec/15	621
Albuquerque	31/Dec/15	4397
Alexandria	31/Dec/15	3944

How can I do this? How can I find the row that has the maximum value?

Also, if there are multiple ways to do it, which is the best way? After describing all of the solutions, I'll compare the performance of each of them.

Note: With each of the outputs below, I've only displayed a small subset, as the full result contains a few thousand rows.

Solution 1 - LEFT OUTER JOIN

The first solution uses a LEFT OUTER JOIN on another instance of the same table.

1SELECT
2s1.city,
3s1.start_date,
4s1.sales
5FROM sales_volume s1
6LEFT OUTER JOIN sales_volume s2 ON (s1.city = s2.city AND s1.sales < s2.sales)
7WHERE s2.city IS NULL
8ORDER BY city;

Result:

CITY	START_DATE	SALES
Abilene	31/Dec/15	2,611
Abilene	31/Dec/15	4,899
Abilene	31/Dec/15	4,177
Abilene	31/Dec/15	3,142
Abilene	31/Dec/15	4,862
Akron	31/Dec/15	621
Akron	31/Dec/15	3,442
Akron	31/Dec/15	4,939
Akron	31/Dec/15	2,249
Akron	31/Dec/15	4,929
Akron	31/Dec/15	1,640
Akron	31/Dec/15	999
Akron	31/Dec/15	2,157
Akron	31/Dec/15	4,811
Akron	31/Dec/15	862
Akron	31/Dec/15	3,664
Albuquerque	31/Dec/15	4,397
Albuquerque	31/Dec/15	398
Albuquerque	31/Dec/15	3,750
Albuquerque	31/Dec/15	4,561
Albuquerque	31/Dec/15	620
Albuquerque	31/Dec/15	3,935
Albuquerque	31/Dec/15	4,820
Albuquerque	31/Dec/15	1,640
Albuquerque	31/Dec/15	2,173
Albuquerque	31/Dec/15	3,318

This query selects all of the fields from the sales_volume table where no other row exists for the same city and with a greater start_date value.

Solution 2 - MAX and DENSE_RANK

Another way of getting the data you need is to use the MAX function.

1SELECT
2s.city,
3MAX(s.start_date) KEEP (DENSE_RANK FIRST ORDER BY sales DESC) AS start_date
4FROM sales_volume s
5GROUP BY s.city
6ORDER BY s.city;

Now, the MAX function is used to find the value of the highest record. It's used as an analytical function by adding the KEEP clause at the end.

The KEEP clause, along with DENSE RANK FIRST tells the function to keep the first value that it finds when looking for the sales value. But, the ORDER BY start_date DESC part tells the function to order the fields by the start_date in descending order. This tells the MAX function to find the row that matches the highest start_date value.

However, the drawback with this solution is that it does not show the entire record.

Solution 3 - Correlated Subquery

Another way to use SQL to select the row with the highest value is to use MAX with a correlated subquery.

 1SELECT
 2s1.city,
 3s1.start_date,
 4s1.sales
 5FROM sales_volume s1
 6WHERE s1.sales = (
 7  SELECT MAX(sales)
 8  FROM sales_volume s2
 9  WHERE s1.city = s2.city
10)
11ORDER BY s1.city;

This query may not be very efficient, as it uses this correlated subquery. But, we will see what happens when we do the performance test at the end of this post.

Solution 4 - MAX and Partition By

The last solution that I'll be demonstrating for using SQL to select a max row is this one, which uses the MAX function and the PARTITION BY clause.

 1SELECT
 2city,
 3start_date,
 4sales
 5FROM (
 6  SELECT
 7  city,
 8  start_date,
 9  sales,
10  MAX(sales) OVER (PARTITION BY city) AS max_sales
11  FROM sales_volume)
12WHERE sales = max_sales
13ORDER BY city;

This uses the MAX function as an analytical query. It uses an inline view (having an SQL query as the FROM clause). Within this view, it adds an extra column which is the maximum start_date value which is partitioned by city. This field has the same value for each city, which is the maximum.

Then, outside this view, in the WHERE clause, it only shows those where the start_date value is the maximum.

Performance Comparison

Now that we have a few different solutions, all showing the correct records, let's compare the performance of each of them.

The table that I used only had a small number of records. Let's expand this with some random data up to 1 million records and see how it performs.

I'll insert the extra records, add an index on the city column and the sales, show the actual run times for each, and check the EXPLAIN PLAN for each solution.

Here are the EXPLAIN PLANs for each of my solutions.

1SELECT
2PLAN_TABLE_OUTPUT
3FROM TABLE(DBMS_XPLAN.DISPLAY());

Solution 1

 1------------------------------------------------------------------------------------------------
 2| Id  | Operation               | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
 3------------------------------------------------------------------------------------------------
 4|   0 | SELECT STATEMENT        |              | 10000 |   390K|       |  5160   (1)| 00:01:02 |
 5|   1 |  SORT ORDER BY          |              | 10000 |   390K|   480K|  5160   (1)| 00:01:02 |
 6|*  2 |   FILTER                |              |       |       |       |            |          |
 7|*  3 |    HASH JOIN RIGHT OUTER|              | 10000 |   390K|    28M|  5054   (1)| 00:01:01 |
 8|   4 |     TABLE ACCESS FULL   | SALES_VOLUME |  1000K|    17M|       |  1008   (1)| 00:00:13 |
 9|   5 |     TABLE ACCESS FULL   | SALES_VOLUME |  1000K|    20M|       |  1009   (1)| 00:00:13 |
10------------------------------------------------------------------------------------------------

Solution 2

1-----------------------------------------------------------------------------------
2| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
3-----------------------------------------------------------------------------------
4|   0 | SELECT STATEMENT   |              |   286 |  6292 |  1035   (4)| 00:00:13 |
5|   1 |  SORT GROUP BY     |              |   286 |  6292 |  1035   (4)| 00:00:13 |
6|   2 |   TABLE ACCESS FULL| SALES_VOLUME |  1000K|    20M|  1009   (1)| 00:00:13 |
7-----------------------------------------------------------------------------------

Solution 3

 1--------------------------------------------------------------------------------------
 2| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
 3--------------------------------------------------------------------------------------
 4|   0 | SELECT STATEMENT      |              |  2739 |   109K|  2048   (3)| 00:00:25 |
 5|   1 |  SORT ORDER BY        |              |  2739 |   109K|  2048   (3)| 00:00:25 |
 6|*  2 |   HASH JOIN           |              |  2739 |   109K|  2047   (3)| 00:00:25 |
 7|   3 |    VIEW               | VW_SQ_1      |   286 |  5434 |  1034   (3)| 00:00:13 |
 8|   4 |     HASH GROUP BY     |              |   286 |  5148 |  1034   (3)| 00:00:13 |
 9|   5 |      TABLE ACCESS FULL| SALES_VOLUME |  1000K|    17M|  1008   (1)| 00:00:13 |
10|   6 |    TABLE ACCESS FULL  | SALES_VOLUME |  1000K|    20M|  1009   (1)| 00:00:13 |
11--------------------------------------------------------------------------------------

Solution 4

1--------------------------------------------------------------------------------------------
2| Id  | Operation           | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
3--------------------------------------------------------------------------------------------
4|   0 | SELECT STATEMENT    |              |  1000K|    55M|       |  7648   (1)| 00:01:32 |
5|*  1 |  VIEW               |              |  1000K|    55M|       |  7648   (1)| 00:01:32 |
6|   2 |   WINDOW SORT       |              |  1000K|    20M|    30M|  7648   (1)| 00:01:32 |
7|   3 |    TABLE ACCESS FULL| SALES_VOLUME |  1000K|    20M|       |  1009   (1)| 00:00:13 |
8--------------------------------------------------------------------------------------------

You can see how they each compare in this table:

Solution	Cardinality	Cost
1	10,010	5,163
2	286	1,035
3	2,745	2,048
4	1,001,040	7,654

One thing that concerns me is the cardinality estimate on Solution 4. I'm not expecting 1 million unique records to be returned, but we'll see what happens when we run the query.

Now, after running each of the queries, I get the following information:

Solution	Run Time (Sec)	Rows
1	793.724	2726
2	0.462	286
3	0.504	2726
4	2.01	2726

You can see here that both Solution 2 and 3 took less than a second to run. Solution 4 took 2 seconds to run. Solution 1 took 793 seconds to run (13 minutes and 13 seconds), which was by far the longest.

Solution 1, 3 and 4 have 2726 records, which is correct because there are multiple occurrences of cities and start_dates with the same value.

Solution 2, however, had much less, as it does not show the full record.

Conclusion

So, in conclusion:

There are many ways to write this query to get the solution you need.
They each have a different performance.
Make sure you create the right indexes on these fields.

For the data I have, I would probably use solution 3, but you should test these solutions against your own data to see which performs better.

Lastly, if you enjoy the information and career advice I've been providing, sign up to my newsletter below to stay up-to-date on my articles. You'll also receive a fantastic bonus. Thanks!