Although both articles state that the identity generation order of an insert between two tables in the same database is preserved, we had to learn that there is about 1% of cases (depending on input data) where the sort order is not honored. Apparently it depends on how the optimizer arranges the execution plan.
There has been a similar problem addressed to MS support, with the answer being that this faulty behavior is only to be fixed in SQL Server 2008: link.
I will keep the below post for informational purposes. Sadly, due to a rare bug it cannot be used reliably. As a workaround you might want to perform INSERTs in a loop using SCOPE_IDENTITY(), or (a bit ugly) insert the temporary key into a column in the target table which has the same data type (and capturing the inserted values in the OUTPUT clause). Then update the temporarily used column with the real value. Note that the INSERT and UPDATE should both be performed in a single transaction which prevents a dirty read on the newly inserted records.
When inserting large amounts of data from one table to another, the INSERT
A tricky task with SQL Server 2005, which might be solved by moving from an INSERT … SELECT pattern to a row-by-row iteration over the source table (e.g. cursor) and inserting data one row at a time – which results in a big performance hit and additional log-space consumption.
However there is a solution which allows us to keep the INSERT … SELECT approach with a few modifications. For this to work, we have to build a mapping table, which has the purpose to map the primary key (or any unique column) from the source table to a unique column in the destination table.
Let us assume that we have a source and destination table which cannot me modified. The only requirement for those tables is that each has a unique column, which can serve as a key for identifying each row (e.g. primary key). For this demonstration, I am declaring those tables as table variables, because it saves us the cleanup work. Off course you can use any table, table valued function or temporary table instead, as long as it has a unique column. Note that I am intentionally declaring some weird primary key numbering intervals, that is because I want to show that the value of the primary key column does not matter.
-- Declare a source and a target table with any kind of primary key. We assume that both tables cannot be modified DECLARE @SOURCE_TABLE TABLE ( ID INT PRIMARY KEY IDENTITY( 3, 5 ), SOURCE_A INT, SOURCE_B INT, SOURCE_C SYSNAME, SOURCE_D SYSNAME, SOURCE_E DATETIME ) DECLARE @TARGET_TABLE TABLE ( ID INT PRIMARY KEY IDENTITY( 5, 3 ), TARGET_A INT, TARGET_B INT, TARGET_C DATETIME DEFAULT GETUTCDATE() )
Next we declare a mapping table, which is supposed to map the primary key of the source table to the primary key of the target table. Again I am using a table variable here, if you are inserting large amounts of data, you may want to use a temporary mapping table instead. Note that the IDENTITY start and interval is identical with the result of the ROW_NUMBER() function.
-- The primary key of the mapping table must be compatible with ROW_NUMBER(). Use a temporary table with indexed SOURCE_ID and TARGET_ID for large inserts DECLARE @MAPPING_TABLE TABLE ( ROW INT PRIMARY KEY IDENTITY( 1, 1 ), SOURCE_ID INT, TARGET_ID INT )
Preparation: To have some test data for demonstration, we grab a few columns from system tables and insert them into the source table.
-- Fill source table with some dummy data INSERT INTO @SOURCE_TABLE ( SOURCE_A, SOURCE_B, SOURCE_C, SOURCE_D, SOURCE_E ) SELECT O.id, C.[TYPE], O.name, C.name, O.crdate FROM sysobjects AS O JOIN syscolumns AS C ON O.id = C.id
Here comes the main trick: While inserting, we capture the newly generated primary key in our mapping table. It is important to note that we are sorting the data to be inserted according to the unique column. After that, we update the primary key from the source table into the mapping table. Note that the primary key of the mapping table is identical with the result of ROW_NUMBER(), and the order is the same as in the insert statement.
-- Insert source into target. Capture the inserted ID, the inserted data must be sorted by primary key of the source table INSERT INTO @TARGET_TABLE ( TARGET_A, TARGET_B ) OUTPUT INSERTED.ID INTO @MAPPING_TABLE ( TARGET_ID ) SELECT SOURCE_A, SOURCE_B FROM @SOURCE_TABLE ORDER BY ID ASC; -- Update the mapping table with the ID of the source table, which we could not capture during the insert WITH CTE AS ( SELECT ID, ROW_NUMBER() OVER ( ORDER BY ID ASC ) AS ROW FROM @SOURCE_TABLE ) UPDATE M SET M.SOURCE_ID = S.ID FROM @MAPPING_TABLE AS M JOIN CTE AS S ON S.ROW = M.ROW
Now we have a nice mapping table which allows us to join the source and target tables together easily!
-- After the insert we can perform a join between source table and target table SELECT M.*, S.*, T.* FROM @MAPPING_TABLE AS M JOIN @TARGET_TABLE AS T ON T.ID = M.TARGET_ID JOIN @SOURCE_TABLE AS S ON S.ID = M.SOURCE_ID ORDER BY M.ROW ASC
The JOIN is clean and simple, and it will work with any kind of column type, as long as the values of the columns in the join uniquely identify a row in each table.
Note: SQL Server 2008 comes with a useful MERGE statement which can be used to achieve the same result with less lines of code.