Saturday, July 4, 2009

Let's look at Merge in SQL Server 2008

Hi,

As you must be aware of the various new features introduced in SQL Server 2008, one of them is MERGE statement.

What is the purpose of Merge:

•MERGE lets you INSERT, UPDATE, AND DELETE in a single operation (i.e. single pass of the data set)
•You provide a source and target table
•Conceptually, this is a full outer join between the two tables and lets you provide commands for rows that match, rows only in the source, and rows only in the target

Actually the NOLOCK and READUNCOMMITTED hints are ignored on any update to an access method to prevent corruption. The original “ignore” decision was likely a result of unintentional support left in the product in an earlier version coupled with not wanting to break customers’ scripts. MERGE just extends the practice to be consistent with I/U/D.

MERGE is in many ways a more direct representation of the internal query algebra to the user. There’s one operator to do I/U/D and there is an $ACTION column that tells the operator what to do. You can see this more easily in MERGE. It translates down to an I/U/D in the query execution operators.

Merge Operations based on :
–WHEN MATCHED
–WHEN [TARGET] NOT MATCHED
–WHEN SOURCE NOT MATCHED

You can specify commands for at least 1 and at most all of these cases and define what should happen (INSERT, UPDATE, DELETE). The Source table can be any table source, target table can be an updateable view but not be a remote table or distributed partitioned view


Merge statment is great for large tables (ex: data warehouse population) where one-pass transforms can give a performance boost. It also can simplify T-SQL logic and push it into the database engine

Let's look at some Restrictions:
–NOLOCK, READUNCOMMITED disallowed (to maintain index integrity)
–Join condition should only be on the matching columns to avoid semantic issues
–Target Table can not be a remote table or distributed partitioned view. Source table can be any row source.


Recommendation-
Experiment with this before a wide-scale deployment. It’s not intended for use in all places where I/U/D are done.

Some more facts on Merge statment:
MERGE statement is transactional
–No explicit transaction required
One Pass Through Tables
–At most a full outer join
–Matching rows = when matched
–Left-outer join rows = when target not matched
–Right-outer join rows = when source not matched


You can specify multiple WHEN clauses
–For MATCHED and SOURCE NOT MATCHED
–Only one WHEN clause for TARGET NOT MATCHED
•MERGE can be used with any table source
•A MERGE statement causes triggers to be fired once
•Rows affected includes total rows affected by all clauses

Let's look at an example:
Here is the code to create and populate the target and source tables:SET NOCOUNT ON;
-- Create Target Table
CREATE TABLE Customers (LastName VARCHAR(50), FirstName VARCHAR(50));
INSERT INTO Customers VALUES ('Doe', 'Jane');
-- Create Source Table
CREATE TABLE NewCustomers(LastName VARCHAR(50), FirstName VARCHAR(50));
INSERT INTO NewCustomers VALUES ('Doe', 'John');
INSERT INTO NewCustomers VALUES ('Smith', 'Doris');

Above we have created a Customers table with 1 record, and a NewCustomers table with 2 different records. The records in the NewCustomers table will be used to update and insert records in to the Customers table. The first record in the NewCustomers table is used to update the existing Customers record, where as the second record in the NewCustomers table will be inserted into the Customers table. To do the insert and update, we will use the following MERGE statement:-- Update and Insert into Customers
MERGE Customers AS C
USING NewCustomers AS NC
ON C.LastName = NC. LastName
WHEN MATCHED THEN
UPDATE SET C.FirstName = NC.FirstName
WHEN NOT MATCHED THEN
INSERT (LastName, FirstName) VALUES (NC.LastName,NC.FirstName);
The target table is identified by naming it immediately following MERGE statement, “Customers”. MERGE statement uses the “USING” clause to identify the source of the updates and inserts, which is the NewCustomers table. In order to identify which records need to update and which to insert we need to identify a key column to match on between the Customers and NewCustomers table. To accomplish this matching of records between the two tables the MERGE statement uses the “ON” clause just as you would do when you are joining two tables. we are joining the source and target tables based on the LastName column in source and target tables. The different “WHEN” criteria of the MERGE statement identify the conditions for when an update or insert is to be performed. When a record is found in both source (NewCustomers) and target (Customers) tables then an UPDATE statement is performed to update on the FirstName column. This condition meets the “WHEN MATCHED” criteria above. When a join does not find a record in the target (Customers) table for a record in the source (newCustomer), then this is the “WHEN NOT MATCHED” criteria is executed. When this occurs an INSERT statement is performed to insert the LastName and FirstName value into the Customers table. After I have run my MERGE statement, my updated Customers table contains the following records:

LastName FirstName
--------------------------------------------------
Doe John
Smith Doris

By looking at the output, the MERGE statement changed the FirstName column of the “Doe” record from “Jane” to “John”. The MERGE did an UPDATE to accomplish this by using the “MATCHED” criteria. The second record was created when the “NOT MATCH” criteria inserted a record.


What are the benefits of Using MERGE

The MERGE statement simplifies the amount of work it takes to implement insert and update logic. Now with a single statement you can easily implement insert, update, and delete logic to handle criteria for maintaining a table. The MERGE statement handles all the joining of the source and target. This minimizes the amount of code you need to write to handle merge logic.


The following example uses MERGE to update the ProductInventory table in the AdventureWorks sample database on a daily basis, based on orders that are processed in the SalesOrderDetail table. The Quantity column of the ProductInventory table is updated by subtracting the number of orders placed each day for each product in the SalesOrderDetail table. If the number of orders for a product drops the inventory level of a product to 0 or less, the row for that product is deleted from the ProductInventory table.

Take a look at following example:
USE AdventureWorks;
GO
IF OBJECT_ID (N'Production.usp_UpdateInventory', N'P') IS NOT NULL DROP PROCEDURE Production.usp_UpdateInventory;
GO

CREATE PROCEDURE Production.usp_UpdateInventory
@OrderDate datetime
AS
MERGE Production.ProductInventory AS target
USING (SELECT ProductID, SUM(OrderQty) FROM Sales.SalesOrderDetail AS sod
JOIN Sales.SalesOrderHeader AS soh
ON sod.SalesOrderID = soh.SalesOrderID
AND soh.OrderDate = @OrderDate
GROUP BY ProductID) AS source (ProductID, OrderQty)
ON (target.ProductID = source.ProductID)
WHEN MATCHED AND target.Quantity - source.OrderQty <= 0
THEN DELETE
WHEN MATCHED 
THEN UPDATE SET target.Quantity = target.Quantity - source.OrderQty, 
target.ModifiedDate = GETDATE()
OUTPUT $action, Inserted.ProductID, Inserted.Quantity, Inserted.ModifiedDate, Deleted.ProductID,
Deleted.Quantity, Deleted.ModifiedDate;
GO

EXECUTE Production.usp_UpdateInventory '20030501'

Here $action means UPDATE, INSERT or DELETE. $action is specific to MERGE statement only. It is not available when OUTPUT command is used with a simple INSERT, UPDATE or DELETE command.


Enjoy.....

No comments: