FB

How to Find the Row That Has the Maximum Value for a Column in OracleDo you need to find the data for a row that has the maximum value for a specific column in Oracle? Learn how to do it and see some alternative answers in this article.

The Table Structure

Let’s say you have a table called sales_volume that looks like this:

CITYSTART_DATESALES
El Paso27/May/154493
Wilmington16/Nov/154656
Richmond11/May/153739
Irvine23/Jan/151356
Berkeley18/May/15263
Pasadena20/Oct/15969
Chandler30/Nov/153679
Montgomery03/Oct/154744
Miramar23/May/153754
Elk Grove22/Jul/151684

You can see there are a range of records here with different data.

 

The Problem – Select the Row That Has the Maximum Value

Now, given you have the table above (or something like it), how could you do the following:

For each different city, find the rows that match the highest start_date value for each city.

So, I would expect an output like this:

CITYSTART_DATESALES
Abilene31/Dec/152611
Akron31/Dec/15621
Albuquerque31/Dec/154397
Alexandria31/Dec/153944

How can I do this? How can I find the row that has the maximum value?

Also, if there are multiple ways to do it, which is the best way? After describing all of the solutions, I’ll compare the performance of each of them.

Note: With each of the outputs below, I’ve only displayed a small subset, as the full result contains a few thousand rows.

 

Solution 1 – LEFT OUTER JOIN

The first solution uses a LEFT OUTER JOIN on another instance of the same table.

SELECT
s1.city,
s1.start_date,
s1.sales
FROM sales_volume s1
LEFT OUTER JOIN sales_volume s2 ON (s1.city = s2.city AND s1.sales < s2.sales)
WHERE s2.city IS NULL
ORDER BY city;

Result:

CITYSTART_DATESALES
Abilene31/Dec/152,611
Abilene31/Dec/154,899
Abilene31/Dec/154,177
Abilene31/Dec/153,142
Abilene31/Dec/154,862
Akron31/Dec/15621
Akron31/Dec/153,442
Akron31/Dec/154,939
Akron31/Dec/152,249
Akron31/Dec/154,929
Akron31/Dec/151,640
Akron31/Dec/15999
Akron31/Dec/152,157
Akron31/Dec/154,811
Akron31/Dec/15862
Akron31/Dec/153,664
Albuquerque31/Dec/154,397
Albuquerque31/Dec/15398
Albuquerque31/Dec/153,750
Albuquerque31/Dec/154,561
Albuquerque31/Dec/15620
Albuquerque31/Dec/153,935
Albuquerque31/Dec/154,820
Albuquerque31/Dec/151,640
Albuquerque31/Dec/152,173
Albuquerque31/Dec/153,318

This query selects all of the fields from the sales_volume table where no other row exists for the same city and with a greater start_date value.

 

Solution 2 – MAX and DENSE_RANK

Another way of getting the data you need is to use the MAX function.

SELECT
s.city,
MAX(s.start_date) KEEP (DENSE_RANK FIRST ORDER BY sales DESC) AS start_date
FROM sales_volume s
GROUP BY s.city
ORDER BY s.city;

 

Now, the MAX function is used to find the value of the highest record. It’s used as an analytical function by adding the KEEP clause at the end.

The KEEP clause, along with DENSE RANK FIRST tells the function to keep the first value that it finds when looking for the sales value. But, the ORDER BY start_date DESC part tells the function to order the fields by the start_date in descending order. This tells the MAX function to find the row that matches the highest start_date value.

However, the drawback with this solution is that it does not show the entire record.

 

Solution 3 – Correlated Subquery

Another way to use SQL to select the row with the highest value is to use MAX with a correlated subquery.

SELECT
s1.city,
s1.start_date,
s1.sales
FROM sales_volume s1
WHERE s1.sales = (
  SELECT MAX(sales)
  FROM sales_volume s2
  WHERE s1.city = s2.city
)
ORDER BY s1.city;

This query may not be very efficient, as it uses this correlated subquery. But, we will see what happens when we do the performance test at the end of this post.

 

Solution 4 – MAX and Partition By

The last solution that I’ll be demonstrating for using SQL to select a max row is this one, which uses the MAX function and the PARTITION BY clause.

SELECT
city,
start_date,
sales
FROM (
  SELECT
  city,
  start_date,
  sales,
  MAX(sales) OVER (PARTITION BY city) AS max_sales
  FROM sales_volume)
WHERE sales = max_sales
ORDER BY city;

This uses the MAX function as an analytical query. It uses an inline view (having an SQL query as the FROM clause). Within this view, it adds an extra column which is the maximum start_date value which is partitioned by city. This field has the same value for each city, which is the maximum.

Then, outside this view, in the WHERE clause, it only shows those where the start_date value is the maximum.

 

Performance Comparison

Now that we have a few different solutions, all showing the correct records, let’s compare the performance of each of them.

The table that I used only had a small number of records. Let’s expand this with some random data up to 1 million records and see how it performs.

I’ll insert the extra records, add an index on the city column and the sales, show the actual run times for each, and check the EXPLAIN PLAN for each solution.

Here are the EXPLAIN PLANs for each of my solutions.

SELECT
PLAN_TABLE_OUTPUT
FROM TABLE(DBMS_XPLAN.DISPLAY());

 

Solution 1

------------------------------------------------------------------------------------------------
| Id  | Operation               | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |              | 10000 |   390K|       |  5160   (1)| 00:01:02 |
|   1 |  SORT ORDER BY          |              | 10000 |   390K|   480K|  5160   (1)| 00:01:02 |
|*  2 |   FILTER                |              |       |       |       |            |          |
|*  3 |    HASH JOIN RIGHT OUTER|              | 10000 |   390K|    28M|  5054   (1)| 00:01:01 |
|   4 |     TABLE ACCESS FULL   | SALES_VOLUME |  1000K|    17M|       |  1008   (1)| 00:00:13 |
|   5 |     TABLE ACCESS FULL   | SALES_VOLUME |  1000K|    20M|       |  1009   (1)| 00:00:13 |
------------------------------------------------------------------------------------------------

 

Solution 2

-----------------------------------------------------------------------------------
| Id  | Operation          | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |              |   286 |  6292 |  1035   (4)| 00:00:13 |
|   1 |  SORT GROUP BY     |              |   286 |  6292 |  1035   (4)| 00:00:13 |
|   2 |   TABLE ACCESS FULL| SALES_VOLUME |  1000K|    20M|  1009   (1)| 00:00:13 |
-----------------------------------------------------------------------------------

 

Solution 3

--------------------------------------------------------------------------------------
| Id  | Operation             | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |              |  2739 |   109K|  2048   (3)| 00:00:25 |
|   1 |  SORT ORDER BY        |              |  2739 |   109K|  2048   (3)| 00:00:25 |
|*  2 |   HASH JOIN           |              |  2739 |   109K|  2047   (3)| 00:00:25 |
|   3 |    VIEW               | VW_SQ_1      |   286 |  5434 |  1034   (3)| 00:00:13 |
|   4 |     HASH GROUP BY     |              |   286 |  5148 |  1034   (3)| 00:00:13 |
|   5 |      TABLE ACCESS FULL| SALES_VOLUME |  1000K|    17M|  1008   (1)| 00:00:13 |
|   6 |    TABLE ACCESS FULL  | SALES_VOLUME |  1000K|    20M|  1009   (1)| 00:00:13 |
--------------------------------------------------------------------------------------

 

Solution 4

--------------------------------------------------------------------------------------------
| Id  | Operation           | Name         | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |              |  1000K|    55M|       |  7648   (1)| 00:01:32 |
|*  1 |  VIEW               |              |  1000K|    55M|       |  7648   (1)| 00:01:32 |
|   2 |   WINDOW SORT       |              |  1000K|    20M|    30M|  7648   (1)| 00:01:32 |
|   3 |    TABLE ACCESS FULL| SALES_VOLUME |  1000K|    20M|       |  1009   (1)| 00:00:13 |
--------------------------------------------------------------------------------------------

 

You can see how they each compare in this table:

SolutionCardinalityCost
110,0105,163
22861,035
32,7452,048
41,001,0407,654

One thing that concerns me is the cardinality estimate on Solution 4. I’m not expecting 1 million unique records to be returned, but we’ll see what happens when we run the query.

Now, after running each of the queries, I get the following information:

SolutionRun Time (Sec)Rows
1793.7242726
20.462286
30.5042726
42.012726

You can see here that both Solution 2 and 3 took less than a second to run. Solution 4 took 2 seconds to run. Solution 1 took 793 seconds to run (13 minutes and 13 seconds), which was by far the longest.

Solution 1, 3 and 4 have 2726 records, which is correct because there are multiple occurrences of cities and start_dates with the same value.

Solution 2, however, had much less, as it does not show the full record.

 

Conclusion

So, in conclusion:

  • There are many ways to write this query to get the solution you need.
  • They each have a different performance.
  • Make sure you create the right indexes on these fields.

For the data I have, I would probably use solution 3, but you should test these solutions against your own data to see which performs better.

Lastly, if you enjoy the information and career advice I’ve been providing, sign up to my newsletter below to stay up-to-date on my articles. You’ll also receive a fantastic bonus. Thanks!

Get Your SQL Cheat Sheets Now: