SQL Merge: The Complete Guide

May 22, 2018 · 12 min read · sql ·

Do you need to insert or update data depending on if it already exists?

You can do this with the SQL MERGE statement.

What is the MERGE Statement?

The MERGE statement is a type of statement that lets you either insert data or update data, depending on if it already exists. It lets you merge two tables in SQL. It's a bit smarter than an INSERT INTO SELECT statement.

SQL MERGE is available in Oracle, SQL Server, and Postgres (not MySQL).

You can analyse records to see if they match between a source and a target table. If the record is found, then an update can be performed. If the record is not found, then an insert can be performed.

It's often called an “upsert”, which is a combination of the word “update” and “insert”, which is what the MERGE statement does.

Why is the MERGE Statement Useful?

The MERGE statement is useful for ETL processes, where you need to move large amounts of data.

It's an alternative to a combination of INSERT and UPDATE statements, and it's faster, and easier to code.

It's often useful when you need to synchronise data regularly from a source table (e.g. daily, weekly, monthly), and you want to update some record and insert new records.

MERGE Statement Syntax

The syntax of a MERGE statement in SQL looks like this:

1MERGE INTO table_name
2USING table_name
3ON (condition)
4WHEN MATCHED THEN update_clause
5DELETE where_clause
6WHEN NOT MATCHED THEN insert_clause
7[LOG ERRORS log_errors_clause reject_limit <integer | unlimited> ]

This syntax includes:

Where the data comes from
Where the data is being inserted or updated to
How to match the data
What happens when a match is found
What happens when a match is not found
How to log errors, if needed

Let's look at each of these clauses.

MERGE INTO table_name

This clause lets you specify the target table name or view name that you are updating or inserting data into.

If you're using a view, the view needs to be an updatable view.

You also need to have INSERT and UPDATE privileges on this table to be able to perform a MERGE on it.

USING table_name

This is where you define where the data comes from. It can be a table, a view, or a subquery.

ON (condition)

This is where you specify the condition that determines if the MERGE statement updates or inserts a row.

Each row in the source table is checked against this condition. If it is true, then the record in the target table is updated, based on what you specify in the update clause. If it is false, then the record is inserted, based on what you specify in the insert clause.

This is often done on some kind of ID field, but can be done on other fields that identify the record.

WHEN MATCHED THEN update_clause

This clause specifies the columns and their new values in the target table.

The WHEN MATCHED means that this clause only occurs if the ON condition is TRUE for a particular row. For example, if the source row already exists in the target table.

You can add a WHERE clause into this update clause to update data if another condition is true. If this condition is true, the data is updated, otherwise no data is updated.

DELETE where_clause

This clause lets you delete data from the target table after it is updated, if it meets a condition you specify.

This allows you to clean up data while you're updating it.

It only applies to rows that meet the WHERE clause specified here AND the ON condition specified earlier. If a row meets this where clause but not the ON condition, then it is not deleted.

Also, this WHERE clause looks at the value after the update statement occurs, not the old value before it was updated.

It saves you from having to write a separate delete statement.

WHEN NOT MATCHED THEN insert_clause

This clause lets you specify values to insert into the target table, if the ON condition is not met.

You should specify the column names and the values here, just like a regular INSERT statement. If you omit the column names, then the columns in the table must match those in the VALUES clause - which can be subject to change in the future!

A WHERE clause can be added here, to allow the database to only insert data that matches the WHERE clause as well as returning false from the ON condition. If the WHERE clause is met but the ON condition returns true, then the data is not inserted.

LOG ERRORS

This clause lets you log any errors that are encountered into a separate table, for you to analyse later.

The syntax of this clause looks like this:

1LOG ERRORS [INTO table_name] [ (simple_expression) ] [REJECT LIMIT integer|UNLIMITED ]

What do all of these keywords mean?

INTO table_name: This is where you specify the name of the table you want to log errors into. This is an optional clause. If you don't specify the INTO table_name, then the database will use the default error logging table, where the name is generated by the DBMS_ERRLOG package. The default name is ERR$_, followed by the first 25 characters of the table being updated.
simple_expression: This is used to identify errors from this statement in the logging table. It can be a literal text or number value, or a general SQL expression.
REJECT LIMIT: This is where you can specify an upper limit for the number of errors to be logged before the MERGE statement ends and rolls back any changes. The default reject limit is 0.

There are some restrictions on this error logging feature:

The following conditions cause the MERGE statement to fail and roll back, and not use the error logging capability:
- Deferred constraints that are violated
- Any MERGE operation that raises a unique constraint or index violation
You can't track errors in the error logging table ofr LONG, LOB, or object type columns.
If the error logging table contains a column of an unsupported type and that column name relates to a column in the target table, then the merge statement will fail when it is parsed.

Merge SQL Examples

Right now you might be thinking this statement seems helpful but complicated.

So, let's take a look at some examples to help explain it.

We're going to use a customer table that stores names, states, and emails for customers. We also have a customer_import table which is where our source data comes from.

This is a simplified scenario of loading a table into a data warehouse, where you have a source table from the main system and a target table of the data warehouse table.

In this example, we have this table:

1CREATE TABLE customer (
2  customer_id NUMBER(5),
3  first_name VARCHAR2(100),
4  last_name VARCHAR2(100),
5  address_state VARCHAR2(10),
6  email_address VARCHAR2(350),
7  CONSTRAINT pk_cust PRIMARY KEY (customer_id)
8);

Now we'll insert some data.

 1INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address)
 2VALUES (1, 'Teresa', 'Hudson', 'NY', 'thudson@abc.com');
 3INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
 4VALUES (2, 'Fred', 'Montgomery', 'CA', 'fmont@gmail.com');
 5INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
 6VALUES (3, 'Lois', 'Lawson', 'OR', 'lois_law@outlook.com');
 7INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
 8VALUES (4, 'Alice', 'Perry', 'SC', 'aliceperry');
 9INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
10VALUES (5, 'Ralph', 'Montgomery', 'TX', 'ralph_mont25@gmail.com');
11INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
12VALUES (6, 'Dorothy', 'Armstrong', 'OR', 'abc123@abc.com');
13INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
14VALUES (7, 'Fred', 'Wallace', 'NY', 'wallacef@testwebsite.co.uk');
15INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
16VALUES (8, 'Joseph', 'Bell', 'FL', 'jbell@comm.edu');
17INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
18VALUES (9, 'Lois', 'Martinez', 'CALIF', 'loismar@awe.com');
19INSERT INTO customer (customer_id, first_name, last_name, address_state, email_address) 
20VALUES (10, 'Robert', 'Rice', 'IN', 'robrice123');

You might notice that some of the state values and email addresses don't meet the format you might expect. We'll see how these change during the MERGE process.

Now, let's say we have a customer_import table using the same table syntax.

1CREATE TABLE customer_import (
2  customer_id NUMBER(5),
3  first_name VARCHAR2(100),
4  last_name VARCHAR2(100),
5  address_state VARCHAR2(10),
6  email_address VARCHAR2(350)
7);

This table is populated with the following data:

 1INSERT INTO customer_import (customer_id, first_name, last_name, address_state, email_address) 
 2VALUES (15, 'John', 'Stock', 'ND', 'js1980@outlook.com');
 3INSERT INTO customer_import (customer_id, first_name, last_name, address_state, email_address) 
 4VALUES (12, 'Joseph', 'Bell', 'OR', 'thebell@yahoo.com');
 5INSERT INTO customer_import (customer_id, first_name, last_name, address_state, email_address) 
 6VALUES (3, 'Lois', 'Lawson', 'WA', 'lois_law@outlook.com');
 7INSERT INTO customer_import (customer_id, first_name, last_name, address_state, email_address) 
 8VALUES (8, 'Joseph', 'Bell', 'FL', 'jbell@gmail.com');
 9INSERT INTO customer_import (customer_id, first_name, last_name, address_state, email_address) 
10VALUES (1, 'Teresa', 'Hudson', 'NY', 'thudson@abc.com');
11INSERT INTO customer_import (customer_id, first_name, last_name, address_state, email_address) 
12VALUES (4, 'Alice', 'Perry', 'SC', 'aliceperry@gmail.com');

It doesn't matter that the values aren't in any particular order.

We now have our two tables: customer, and customer_import. They look like this:

customer:

customer _id	first_name	last_name	address _state	email_address
1	Teresa	Hudson	NY	thudson@abc.com
2	Fred	Montgomery	CA	fmont@gmail.com
3	Lois	Lawson	OR	lois_law@outlook.com
4	Alice	Perry	SC	aliceperry
5	Ralph	Montgomery	TX	ralph_mont25@gmail.com
6	Dorothy	Armstrong	OR	abc123@abc.com
7	Fred	Wallace	NY	wallacef@testwebsite.co.uk
8	Joseph	Bell	FL	jbell@comm.edu
9	Lois	Martinez	CALIF	loismar@awe.com
10	Robert	Rice	IN	robrice123

customer_import:

customer _id	first_name	last_name	address _state	email_address
15	John	Stock	ND	js1980@outlook.com
12	Joseph	Bell	OR	thebell@yahoo.com
3	Lois	Lawson	WA	lois_law@outlook.com
8	Joseph	Bell	FL	jbell@gmail.com
1	Teresa	Hudson	NY	thudson@abc.com
4	Alice	Perry	SC	aliceperry@gmail.com

Let's use the MERGE statement to load the data from the customer_import table into the customer table.

We'll use the customer_id field to see if the record already exists. We could use any of the other fields, such as first_name and last_name, but in this example the customer_id is the identifier.

So, our statement would look like this:

 1MERGE INTO customer c
 2USING customer_import i
 3ON (c.customer_id = i.customer_id)
 4WHEN MATCHED THEN
 5UPDATE SET
 6c.first_name = i.first_name,
 7c.last_name = i.last_name,
 8c.address_state = i.address_state,
 9c.email_address = i.email_address
10WHEN NOT MATCHED THEN
11INSERT (c.customer_id, c.first_name, c.last_name, c.address_state, c.email_address)
12VALUES (i.customer_id, i.first_name, i.last_name, i.address_state, i.email_address);

Now, let's see the updated customer table.

customer _id	first_name	last_name	address _state	email_address
1	Teresa	Hudson	NY	thudson@abc.com
2	Fred	Montgomery	CA	fmont@gmail.com
3	Lois	Lawson	WA	lois_law@outlook.com
4	Alice	Perry	SC	aliceperry@gmail.com
5	Ralph	Montgomery	TX	ralph_mont25@gmail.com
6	Dorothy	Armstrong	OR	abc123@abc.com
7	Fred	Wallace	NY	wallacef@testwebsite.co.uk
8	Joseph	Bell	FL	jbell@gmail.com
9	Lois	Martinez	CALIF	loismar@awe.com
10	Robert	Rice	IN	robrice123
12	Joseph	Bell	OR	thebell@yahoo.com
15	John	Stock	ND	js1980@outlook.com

I have highlighted the data that is new or changed here.

The changes that came from the customer_import table can be explained:

ID 15 was inserted as it's a new row (no row with ID 15 exists).
ID 12 was inserted as it's a new row (no row with ID 12 exists). Even though it's the same name as ID 8, it's a new row because the matching is done on customer_id and not on the name.
ID 3 had the address_state updated as it was the only field that was different.
ID 8 had the email_address updated.
ID 1 was in the import table but no data changed, as it was already the same.
ID 4 had the email_address updated as well.

So with this simple example, we can see that some data is updated and other data is inserted, based on if a match is found.

Insert Only and Update Only

Can you do an insert-only or an update-only MERGE statement?

Yes, you can. There are two ways to do this:

Omit one of the clauses
Use a condition that is always true or always false

Why would you do this as a MERGE instead of an INSERT or UPDATE statement?

Well, if you only want to update data that already exists, or insert data that does not exist, then you might find it easier to write an SQL statement using MERGE than writing an INSERT or an UPDATE statement that looks for matches using a subquery.

Let's look at the two methods.

Method 1: Omit one of the clauses

Because the WHEN MATCHED and WHEN NOT MATCHED clauses are optional, one of them can be omitted from the MERGE statement.

If you leave out the WHEN MATCHED clause, it means no data will be updated, and it will be an insert-only MERGE statement.

If you leave out the WHEN NOT MATCHED clause, it means no data will be inserted, and it will be an update-only MERGE statement.

Using the previous example, writing an insert-only MERGE statement would look like this:

1MERGE INTO customer c
2USING customer_import i
3ON (c.customer_id = i.customer_id)
4WHEN NOT MATCHED THEN
5INSERT (c.customer_id, c.first_name, c.last_name, c.address_state, c.email_address)
6VALUES (c.customer_id, c.first_name, c.last_name, c.address_state, c.email_address);

Writing an update-only MERGE statement would look like this:

1MERGE INTO customer c
2USING customer_import i
3ON (c.customer_id = i.customer_id)
4WHEN MATCHED THEN
5UPDATE SET
6c.first_name = i.first_name,
7c.last_name = i.last_name,
8c.address_state = i.address_state,
9c.email_address = i.email_address;

Method 2: Use a condition that's always true or false

Another way to write a MERGE statement that only inserts or updates is to use a condition in the MERGE statement that always returns true or false.

If the condition always returns true, then the WHEN MATCHED will always run. Alternatively, if the condition always returns false, then the WHEN NOT MATCHED will always run.

An example of a true condition would be this:

1MERGE INTO customer
2USING customer_import
3ON (1=1)

An example of a false condition would be this:

1MERGE INTO customer
2USING customer_import
3ON (1=0)

What is the advantage of writing it this way? There is no join performed to the second table, which means it could perform faster.

Can the Database Update Multiple Tables as Part of the MERGE Statement?

No, this is not possible. Both the MERGE statement and the UPDATE statement only allow you to update one table as part of a query.

To update two tables, you could write some procedural code to get the data you need to update, and update two tables inside a loop.

Things to Be Aware Of With the SQL MERGE Statement

There are a few things to be aware of if you're using the MERGE statement in SQL.

You need to have the INSERT and UPDATE privilege on the target table, and the SELECT privilege on the source table for the MERGE statement to work.
You can't update any of the columns you are matching on in the ON clause. Using the example above, if you try to update the customer_id, you'll get an error. In Oracle, this is: ORA-38104: Columns referenced in the ON clause cannot be updated.
You can only update a row in the target table once.
Oracle does not implement fine-grained access control during MERGE statements.
The MERGE statement will record the row as “merged” if the source and target record is identical and the merge condition is true.
MERGE does not tell you the number of rows that have been inserted or updated. It only shows the total number of rows merged.

Conclusion

The MERGE statement is quite useful when updating records in a table from another table. It's often used in ETL or data warehousing scenarios but is helpful whenever you need to perform an insert and update based on a condition.