FB

How to Find the Row That Has the Maximum Value for a Column in OracleDo you need to find the data for a row that has the maximum value for a specific column in Oracle? Learn how to do it and see some alternative answers in this article.

The Table Structure

Let’s say you have a table called sales_volume that looks like this:

CITY START_DATE SALES
El Paso 27/May/15 4493
Wilmington 16/Nov/15 4656
Richmond 11/May/15 3739
Irvine 23/Jan/15 1356
Berkeley 18/May/15 263
Pasadena 20/Oct/15 969
Chandler 30/Nov/15 3679
Montgomery 03/Oct/15 4744
Miramar 23/May/15 3754
Elk Grove 22/Jul/15 1684

You can see there are a range of records here with different data.

 

The Problem – Select the Row That Has the Maximum Value

Now, given you have the table above (or something like it), how could you do the following:

For each different city, find the rows that match the highest start_date value for each city.

So, I would expect an output like this:

CITY START_DATE SALES
Abilene 31/Dec/15 2611
Akron 31/Dec/15 621
Albuquerque 31/Dec/15 4397
Alexandria 31/Dec/15 3944

How can I do this? How can I find the row that has the maximum value?

Also, if there are multiple ways to do it, which is the best way? After describing all of the solutions, I’ll compare the performance of each of them.

Note: With each of the outputs below, I’ve only displayed a small subset, as the full result contains a few thousand rows.

 

Solution 1 – LEFT OUTER JOIN

The first solution uses a LEFT OUTER JOIN on another instance of the same table.

SELECT
s1.city,
s1.start_date,
s1.sales
FROM sales_volume s1
LEFT OUTER JOIN sales_volume s2 ON (s1.city = s2.city AND s1.sales < s2.sales)
WHERE s2.city IS NULL
ORDER BY city;

Result:

CITY START_DATE SALES
Abilene 31/Dec/15 2,611
Abilene 31/Dec/15 4,899
Abilene 31/Dec/15 4,177
Abilene 31/Dec/15 3,142
Abilene 31/Dec/15 4,862
Akron 31/Dec/15 621
Akron 31/Dec/15 3,442
Akron 31/Dec/15 4,939
Akron 31/Dec/15 2,249
Akron 31/Dec/15 4,929
Akron 31/Dec/15 1,640
Akron 31/Dec/15 999
Akron 31/Dec/15 2,157
Akron 31/Dec/15 4,811
Akron 31/Dec/15 862
Akron 31/Dec/15 3,664
Albuquerque 31/Dec/15 4,397
Albuquerque 31/Dec/15 398
Albuquerque 31/Dec/15 3,750
Albuquerque 31/Dec/15 4,561
Albuquerque 31/Dec/15 620
Albuquerque 31/Dec/15 3,935
Albuquerque 31/Dec/15 4,820
Albuquerque 31/Dec/15 1,640
Albuquerque 31/Dec/15 2,173
Albuquerque 31/Dec/15 3,318

This query selects all of the fields from the sales_volume table where no other row exists for the same city and with a greater start_date value.

 

Solution 2 – MAX and DENSE_RANK

Another way of getting the data you need is to use the MAX function.

SELECT
s.city,
MAX(s.start_date) KEEP (DENSE_RANK FIRST ORDER BY sales DESC) AS start_date
FROM sales_volume s
GROUP BY s.city
ORDER BY s.city;

 

Now, the MAX function is used to find the value of the highest record. It’s used as an analytical function by adding the KEEP clause at the end.

The KEEP clause, along with DENSE RANK FIRST tells the function to keep the first value that it finds when looking for the sales value. But, the ORDER BY start_date DESC part tells the function to order the fields by the start_date in descending order. This tells the MAX function to find the row that matches the highest start_date value.

However, the drawback with this solution is that it does not show the entire record.

 

Solution 3 – Correlated Subquery

Another way to use SQL to select the row with the highest value is to use MAX with a correlated subquery.

SELECT
s1.city,
s1.start_date,
s1.sales
FROM sales_volume s1
WHERE s1.sales = (
  SELECT MAX(sales)
  FROM sales_volume s2
  WHERE s1.city = s2.city
)
ORDER BY s1.city;

This query may not be very efficient, as it uses this correlated subquery. But, we will see what happens when we do the performance test at the end of this post.

 

Solution 4 – MAX and Partition By

The last solution that I’ll be demonstrating for using SQL to select a max row is this one, which uses the MAX function and the PARTITION BY clause.

SELECT
city,
start_date,
sales
FROM (
  SELECT
  city,
  start_date,
  sales,
  MAX(sales) OVER (PARTITION BY city) AS max_sales
  FROM sales_volume)
WHERE sales = max_sales
ORDER BY city;

This uses the MAX function as an analytical query. It uses an inline view (having an SQL query as the FROM clause). Within this view, it adds an extra column which is the maximum start_date value which is partitioned by city. This field has the same value for each city, which is the maximum.

Then, outside this view, in the WHERE clause, it only shows those where the start_date value is the maximum.

 

Performance Comparison

Now that we have a few different solutions, all showing the correct records, let’s compare the performance of each of them.

The table that I used only had a small number of records. Let’s expand this with some random data up to 1 million records and see how it performs.

I’ll insert the extra records, add an index on the city column and the sales, show the actual run times for each, and check the EXPLAIN PLAN for each solution.

Here are the EXPLAIN PLANs for each of my solutions.

SELECT
PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY());

 

Solution 1

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |              | 10000 |   390K|       |  5160   (1)| 00:01:02 |
|   1 |  SORT ORDER BY          |              | 10000 |   390K|   480K|  5160   (1)| 00:01:02 |
|*  2 |   FILTER                |              |       |       |       |            |          |
|*  3 |    HASH JOIN RIGHT OUTER|              | 10000 |   390K|    28M|  5054   (1)| 00:01:01 |
|   4 |     TABLE ACCESS FULL   | SALES_VOLUME |  1000K|    17M|       |  1008   (1)| 00:00:13 |
|   5 |     TABLE ACCESS FULL   | SALES_VOLUME |  1000K|    20M|       |  1009   (1)| 00:00:13 |
------------------------------------------------------------------------------------------------

 

Solution 2

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |   286 |  6292 |  1035   (4)| 00:00:13 |
|   1 |  SORT GROUP BY     |              |   286 |  6292 |  1035   (4)| 00:00:13 |
|   2 |   TABLE ACCESS FULL| SALES_VOLUME |  1000K|    20M|  1009   (1)| 00:00:13 |
-----------------------------------------------------------------------------------

 

Solution 3

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |  2739 |   109K|  2048   (3)| 00:00:25 |
|   1 |  SORT ORDER BY        |              |  2739 |   109K|  2048   (3)| 00:00:25 |
|*  2 |   HASH JOIN           |              |  2739 |   109K|  2047   (3)| 00:00:25 |
|   3 |    VIEW               | VW_SQ_1      |   286 |  5434 |  1034   (3)| 00:00:13 |
|   4 |     HASH GROUP BY     |              |   286 |  5148 |  1034   (3)| 00:00:13 |
|   5 |      TABLE ACCESS FULL| SALES_VOLUME |  1000K|    17M|  1008   (1)| 00:00:13 |
|   6 |    TABLE ACCESS FULL  | SALES_VOLUME |  1000K|    20M|  1009   (1)| 00:00:13 |
--------------------------------------------------------------------------------------

 

Solution 4

--------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |  1000K|    55M|       |  7648   (1)| 00:01:32 |
|*  1 |  VIEW               |              |  1000K|    55M|       |  7648   (1)| 00:01:32 |
|   2 |   WINDOW SORT       |              |  1000K|    20M|    30M|  7648   (1)| 00:01:32 |
|   3 |    TABLE ACCESS FULL| SALES_VOLUME |  1000K|    20M|       |  1009   (1)| 00:00:13 |
--------------------------------------------------------------------------------------------

 

You can see how they each compare in this table:

Solution Cardinality Cost
1 10,010 5,163
2 286 1,035
3 2,745 2,048
4 1,001,040 7,654

One thing that concerns me is the cardinality estimate on Solution 4. I’m not expecting 1 million unique records to be returned, but we’ll see what happens when we run the query.

Now, after running each of the queries, I get the following information:

Solution Run Time (Sec) Rows
1 793.724 2726
2 0.462 286
3 0.504 2726
4 2.01 2726

You can see here that both Solution 2 and 3 took less than a second to run. Solution 4 took 2 seconds to run. Solution 1 took 793 seconds to run (13 minutes and 13 seconds), which was by far the longest.

Solution 1, 3 and 4 have 2726 records, which is correct because there are multiple occurrences of cities and start_dates with the same value.

Solution 2, however, had much less, as it does not show the full record.

 

Conclusion

So, in conclusion:

  • There are many ways to write this query to get the solution you need.
  • They each have a different performance.
  • Make sure you create the right indexes on these fields.

For the data I have, I would probably use solution 3, but you should test these solutions against your own data to see which performs better.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!

Get Your SQL Function Cheat Sheet Now: