Archive

Archive for July, 2011

Microsoft SQL Server 2008/2005/2000 Most Wanted Features

July 24th, 2011 No comments

I could have named this “most annoying limitations” as well:

SQL Server 2008

 
No automatic cascading updates of foreign keys with multiple references.

The only workaround for this common scenario is to use triggers for the second column. Ouch. This even applies if the foreign keys pointing to the same primary key are not in the same table, but in different tables which have FK-relationships.
The following will not work, even when we specify ON UPDATE SET NULL for the second USER_ID foreign key.

CREATE TABLE USERS ( ID INT PRIMARY KEY IDENTITY( 0, 1 ), SURNAME INT )
CREATE TABLE ACTIONS ( 
    ID INT PRIMARY KEY IDENTITY( 0, 1 ), 
    CREATING_USER_ID INT FOREIGN KEY REFERENCES USERS( ID ) ON UPDATE CASCADE, 
    EXECUTING_USER_ID INT FOREIGN KEY REFERENCES USERS( ID ) ON UPDATE CASCADE,
    VALUE1 INT )

Fails with error: “Introducing FOREIGN KEY constraint ‘XXX’ on table ‘ACTIONS’ may cause cycles or multiple cascade paths. Specify ON DELETE NO ACTION or ON UPDATE NO ACTION, or modify other FOREIGN KEY constraints.”. The only thing permitted is ON UPDATE NO ACTION.

Using Table Valued Parameters With Any Type

TVPs offer big benefits when sending mass data from a client application to a SQL Server. However they are restricted to be used only in conjunction with custom types. It is not possible to use TVPs without prior declaration of a custom type – which necessitates additional changes to the database schema and permissions for every usage of a TVP.
Supported example:

CREATE TYPE dbo.MyType AS TABLE( MyColumn INT ); 
GRANT EXEC ON TYPE::dbo.MyType TO MyRole;
SqlParameter UrlParam = cmd.Parameters.AddWithValue( "@MyTVP", records );
UrlParam.SqlDbType = SqlDbType.Structured;
UrlParam.TypeName = "dbo.MyType";

More flexible solution, but not supported:

SqlParameter UrlParam = cmd.Parameters.AddWithValue( "@MyTVP", records );
UrlParam.SqlDbType = SqlDbType.Structured;
UrlParam.TypeName = "TABLE( MyColumn INT )";

Declaring Variables With Block Scope

It is as if you are forced to work with global variables as in the good old times of programming. The following looks weird, but will actually compile and run, although variable declaration and initialization are always skipped:

CREATE PROCEDURE spTEST AS
IF 0 = 1
BEGIN
    DECLARE @v INT
    SET @v = 0
END
SELECT @v

CREATE OR REPLACE / CREATE OR ALTER

No create or replace / create or alter for user defined objects. Something that makes a developer temporarily switch from ALTER PROCEDURE to CREATE PROCEDURE and back every morning after the dev database has been wiped clean.

There is a workaround, however it is a bad one:

IF OBJECT_ID( "database.schema.mytable" ) IS NOT NULL
    DROP TABLE mytable
CREATE TABLE mytable ...

In this case we do not need to modify our code depending on if the object exists or not. But what if the object exists and there are already some permissions assigned to it? Go figure.

SQL Server 2005

 
Capturing non-inserted Columns

When doing a mass insert using INSERT … SELECT, it is not possible to capture columns from the source or destination table which are not part of the insert. Is is especially useful when inserting into a table with an IDENTITY PRIMARY KEY. SQL Server 2008 improves this by introducing the MERGE statement. For a workaround, see my post on this subject.

Variable Declaration and Assignment

Only available from SQL Server 2008.

DECLARE @v INT = 0;

Instead of:

DECLARE @v INT
SET @v = 0

SQL Server 2000

 
Builtin Paging in Result Sets

Has to be done the hard way. Resolved in SQL Server 2005 by introduction of the ROW_NUMBER() OVER (…) expression:

WITH CTE AS ( SELECT *, ROW_NUMBER() OVER ( ORDER BY id ASC ) AS ROW FROM syscolumns )
SELECT * FROM CTE WHERE ROW BETWEEN 13 AND 26

This is a workaround for SQL Server 2000, assuming that row numbers start with 1:

CREATE TABLE #TMP( ID1 INT PRIMARY KEY IDENTITY( 1, 1 ), ID2 INT )
INSERT INTO #TMP ( ID2 ) SELECT id FROM sysobjects ORDER BY crdate
SELECT o.* FROM sysobjects AS o JOIN #TMP AS t ON o.id = t.ID2 WHERE t.ID1 BETWEEN 3 AND 9
DROP TABLE #TMP

Merging Inserted Data Using OUTPUT in SQL Server 2005

July 22nd, 2011 No comments

Warning: there is evidence that SQL Server 2005 does not preserve sort order in all cases, contrary to Microsoft’s statements in the Knowledge Base and here.

Although both articles state that the identity generation order of an insert between two tables in the same database is preserved, we had to learn that there is about 1% of cases (depending on input data) where the sort order is not honored. Apparently it depends on how the optimizer arranges the execution plan.
There has been a similar problem addressed to MS support, with the answer being that this faulty behavior is only to be fixed in SQL Server 2008: link.

I will keep the below post for informational purposes. Sadly, due to a rare bug it cannot be used reliably. As a workaround you might want to perform INSERTs in a loop using SCOPE_IDENTITY(), or (a bit ugly) insert the temporary key into a column in the target table which has the same data type (and capturing the inserted values in the OUTPUT clause). Then update the temporarily used column with the real value. Note that the INSERT and UPDATE should both be performed in a single transaction which prevents a dirty read on the newly inserted records.

Original post:

When inserting large amounts of data from one table to another, the INSERT … SELECT statement is usually the most efficient approach. However sometimes we need to be able to retrieve columns from the source table which were not part of the actual insert and combine them with data in the target table.

A tricky task with SQL Server 2005, which might be solved by moving from an INSERT … SELECT pattern to a row-by-row iteration over the source table (e.g. cursor) and inserting data one row at a time – which results in a big performance hit and additional log-space consumption.

However there is a solution which allows us to keep the INSERT … SELECT approach with a few modifications. For this to work, we have to build a mapping table, which has the purpose to map the primary key (or any unique column) from the source table to a unique column in the destination table.

Let us assume that we have a source and destination table which cannot me modified. The only requirement for those tables is that each has a unique column, which can serve as a key for identifying each row (e.g. primary key). For this demonstration, I am declaring those tables as table variables, because it saves us the cleanup work. Off course you can use any table, table valued function or temporary table instead, as long as it has a unique column. Note that I am intentionally declaring some weird primary key numbering intervals, that is because I want to show that the value of the primary key column does not matter.

-- Declare a source and a target table with any kind of primary key. We assume that both tables cannot be modified
DECLARE @SOURCE_TABLE  TABLE ( ID  INT PRIMARY KEY IDENTITY( 3, 5 ), SOURCE_A INT, SOURCE_B INT, SOURCE_C SYSNAME, SOURCE_D SYSNAME, SOURCE_E DATETIME )
DECLARE @TARGET_TABLE  TABLE ( ID  INT PRIMARY KEY IDENTITY( 5, 3 ), TARGET_A INT, TARGET_B INT, TARGET_C DATETIME DEFAULT GETUTCDATE() )

Next we declare a mapping table, which is supposed to map the primary key of the source table to the primary key of the target table. Again I am using a table variable here, if you are inserting large amounts of data, you may want to use a temporary mapping table instead. Note that the IDENTITY start and interval is identical with the result of the ROW_NUMBER() function.

-- The primary key of the mapping table must be compatible with ROW_NUMBER(). Use a temporary table with indexed SOURCE_ID and TARGET_ID for large inserts
DECLARE @MAPPING_TABLE TABLE ( ROW INT PRIMARY KEY IDENTITY( 1, 1 ), SOURCE_ID INT, TARGET_ID INT )

Preparation: To have some test data for demonstration, we grab a few columns from system tables and insert them into the source table.

-- Fill source table with some dummy data
INSERT INTO @SOURCE_TABLE ( SOURCE_A, SOURCE_B, SOURCE_C, SOURCE_D, SOURCE_E )
	SELECT O.id, C.[TYPE], O.name, C.name, O.crdate 
	FROM sysobjects AS O JOIN syscolumns AS C ON O.id = C.id

Here comes the main trick: While inserting, we capture the newly generated primary key in our mapping table. It is important to note that we are sorting the data to be inserted according to the unique column. After that, we update the primary key from the source table into the mapping table. Note that the primary key of the mapping table is identical with the result of ROW_NUMBER(), and the order is the same as in the insert statement.

-- Insert source into target. Capture the inserted ID, the inserted data must be sorted by primary key of the source table
INSERT INTO @TARGET_TABLE ( TARGET_A, TARGET_B )
	OUTPUT INSERTED.ID INTO @MAPPING_TABLE ( TARGET_ID )
	SELECT SOURCE_A, SOURCE_B FROM @SOURCE_TABLE
	ORDER BY ID ASC;
 
-- Update the mapping table with the ID of the source table, which we could not capture during the insert
WITH CTE AS ( SELECT ID, ROW_NUMBER() OVER ( ORDER BY ID ASC ) AS ROW FROM @SOURCE_TABLE )
UPDATE M SET M.SOURCE_ID = S.ID 
	FROM @MAPPING_TABLE AS M
	JOIN CTE AS S ON S.ROW = M.ROW

Now we have a nice mapping table which allows us to join the source and target tables together easily!

-- After the insert we can perform a join between source table and target table
SELECT M.*, S.*, T.*  
	FROM @MAPPING_TABLE AS M
	JOIN @TARGET_TABLE AS T ON T.ID = M.TARGET_ID
	JOIN @SOURCE_TABLE AS S ON S.ID = M.SOURCE_ID
	ORDER BY M.ROW ASC

The JOIN is clean and simple, and it will work with any kind of column type, as long as the values of the columns in the join uniquely identify a row in each table.

Note: SQL Server 2008 comes with a useful MERGE statement which can be used to achieve the same result with less lines of code.