Archive

Posts Tagged ‘SQL’

T-SQL Word Randomizer – Does the Order of Letters in a Word Matter?

May 19th, 2012 No comments

Perhaps you have come across this paragraph before:

Acdrocing to a rcersaheer at Cdmiabrge Uterniisvy, it dens’ot mttear in waht odrer the letters in a word are, the only inapmtrot tinhg is that the frist and last lteetr be at the rghit plcae. The rset can be a ttoal mess and you can siltl raed it wtihuot plreobm. Tihs is baesuce the hmuan mind does not raed ervey lteetr by ilestf but the wrod as a wlohe.

When I saw this, I thought about how to write such a randomizer in T-SQL without using loops and variables:

WITH CTE1 AS(
	SELECT 'According to a researcher at Cambridge University, it doesn''t matter in what order the letters in a word are, the only important thing is that the first and last letter be at the right place. The rest can be a total mess and you can still read it without problem. This is because the human mind does not read every letter by itself but the word as a whole.' AS A
	UNION ALL
	SELECT 'Manche Leute sagen es spielt keine Rolle, in welcher Reihenfolge die Buchstaben in einem Wort angeordnet sind, solange der erste und der letzte Buchstabe nicht verändert wird.'
), CTE2 AS (
	SELECT ROW_NUMBER() OVER( ORDER BY LEFT( A, 0 ) ) AS ID1, * FROM CTE1
), CTE3 AS (
	SELECT ID1, 1 AS ID2, SUBSTRING( A, 1, 1 ) AS C, A FROM CTE2
	UNION ALL 
	SELECT ID1, ID2 + 1, SUBSTRING( A, ID2 + 1, 1 ), A FROM CTE3 WHERE ID2 < LEN( A )
), CTE4 AS (
	SELECT A.ID1, A.ID2, A.A, A.C, CASE WHEN B.C IS NOT NULL AND C.C IS NOT NULL THEN 0 ELSE 1 END AS ID3 FROM CTE3 AS A
		LEFT JOIN CTE3 AS B ON A.ID1 = B.ID1 AND A.ID2 - 1 = B.ID2 AND B.C NOT IN ( ' ', '.', ',', '[', '(', ']', ')' )
		LEFT JOIN CTE3 AS C ON A.ID1 = C.ID1 AND A.ID2 + 1 = C.ID2 AND C.C NOT IN ( ' ', '.', ',', '[', '(', ']', ')' )
), CTE5 AS (
	SELECT A.ID1, A.ID2, A.A, A.C, CASE A.ID3 WHEN 0 THEN ( SELECT MAX( B.ID2 ) + 1 FROM CTE4 AS B WHERE A.ID1 = B.ID1 AND B.ID2 < A.ID2 AND B.ID3 = 1 ) ELSE ID2 END AS ID3 FROM CTE4 AS A
), CTE6 AS (
	SELECT CTE2.ID1, ( SELECT CTE5.C + '' FROM CTE5 WHERE CTE5.ID1 = CTE2.ID1 ORDER BY CTE5.ID3, NEWID() FOR XML PATH(''), TYPE ).VALUE( 'text()[1]', 'NVARCHAR(MAX)' ) AS A
		FROM CTE2
)
SELECT * FROM CTE6 ORDER BY ID1 ASC OPTION( MAXRECURSION 32767 )

Here is a German example:

Mahnce Letue seagn es speilt kenie Rolle, in wcelher Renfhioglee die Bbetschuan in eniem Wort aegdnonret snid, solnage der erste und der ltteze Bsbctauhe nihct vedrreänt wrid.

Ok, this is not terribly efficient, but it proves that recursive CTEs have their right of existence at last ;-)

Merging Inserted Data Using OUTPUT in SQL Server 2005

July 22nd, 2011 No comments

Warning: there is evidence that SQL Server 2005 does not preserve sort order in all cases, contrary to Microsoft’s statements in the Knowledge Base and here.

Although both articles state that the identity generation order of an insert between two tables in the same database is preserved, we had to learn that there is about 1% of cases (depending on input data) where the sort order is not honored. Apparently it depends on how the optimizer arranges the execution plan.
There has been a similar problem addressed to MS support, with the answer being that this faulty behavior is only to be fixed in SQL Server 2008: link.

I will keep the below post for informational purposes. Sadly, due to a rare bug it cannot be used reliably. As a workaround you might want to perform INSERTs in a loop using SCOPE_IDENTITY(), or (a bit ugly) insert the temporary key into a column in the target table which has the same data type (and capturing the inserted values in the OUTPUT clause). Then update the temporarily used column with the real value. Note that the INSERT and UPDATE should both be performed in a single transaction which prevents a dirty read on the newly inserted records.

Original post:

When inserting large amounts of data from one table to another, the INSERT … SELECT statement is usually the most efficient approach. However sometimes we need to be able to retrieve columns from the source table which were not part of the actual insert and combine them with data in the target table.

A tricky task with SQL Server 2005, which might be solved by moving from an INSERT … SELECT pattern to a row-by-row iteration over the source table (e.g. cursor) and inserting data one row at a time – which results in a big performance hit and additional log-space consumption.

However there is a solution which allows us to keep the INSERT … SELECT approach with a few modifications. For this to work, we have to build a mapping table, which has the purpose to map the primary key (or any unique column) from the source table to a unique column in the destination table.

Let us assume that we have a source and destination table which cannot me modified. The only requirement for those tables is that each has a unique column, which can serve as a key for identifying each row (e.g. primary key). For this demonstration, I am declaring those tables as table variables, because it saves us the cleanup work. Off course you can use any table, table valued function or temporary table instead, as long as it has a unique column. Note that I am intentionally declaring some weird primary key numbering intervals, that is because I want to show that the value of the primary key column does not matter.

-- Declare a source and a target table with any kind of primary key. We assume that both tables cannot be modified
DECLARE @SOURCE_TABLE  TABLE ( ID  INT PRIMARY KEY IDENTITY( 3, 5 ), SOURCE_A INT, SOURCE_B INT, SOURCE_C SYSNAME, SOURCE_D SYSNAME, SOURCE_E DATETIME )
DECLARE @TARGET_TABLE  TABLE ( ID  INT PRIMARY KEY IDENTITY( 5, 3 ), TARGET_A INT, TARGET_B INT, TARGET_C DATETIME DEFAULT GETUTCDATE() )

Next we declare a mapping table, which is supposed to map the primary key of the source table to the primary key of the target table. Again I am using a table variable here, if you are inserting large amounts of data, you may want to use a temporary mapping table instead. Note that the IDENTITY start and interval is identical with the result of the ROW_NUMBER() function.

-- The primary key of the mapping table must be compatible with ROW_NUMBER(). Use a temporary table with indexed SOURCE_ID and TARGET_ID for large inserts
DECLARE @MAPPING_TABLE TABLE ( ROW INT PRIMARY KEY IDENTITY( 1, 1 ), SOURCE_ID INT, TARGET_ID INT )

Preparation: To have some test data for demonstration, we grab a few columns from system tables and insert them into the source table.

-- Fill source table with some dummy data
INSERT INTO @SOURCE_TABLE ( SOURCE_A, SOURCE_B, SOURCE_C, SOURCE_D, SOURCE_E )
	SELECT O.id, C.[TYPE], O.name, C.name, O.crdate 
	FROM sysobjects AS O JOIN syscolumns AS C ON O.id = C.id

Here comes the main trick: While inserting, we capture the newly generated primary key in our mapping table. It is important to note that we are sorting the data to be inserted according to the unique column. After that, we update the primary key from the source table into the mapping table. Note that the primary key of the mapping table is identical with the result of ROW_NUMBER(), and the order is the same as in the insert statement.

-- Insert source into target. Capture the inserted ID, the inserted data must be sorted by primary key of the source table
INSERT INTO @TARGET_TABLE ( TARGET_A, TARGET_B )
	OUTPUT INSERTED.ID INTO @MAPPING_TABLE ( TARGET_ID )
	SELECT SOURCE_A, SOURCE_B FROM @SOURCE_TABLE
	ORDER BY ID ASC;
 
-- Update the mapping table with the ID of the source table, which we could not capture during the insert
WITH CTE AS ( SELECT ID, ROW_NUMBER() OVER ( ORDER BY ID ASC ) AS ROW FROM @SOURCE_TABLE )
UPDATE M SET M.SOURCE_ID = S.ID 
	FROM @MAPPING_TABLE AS M
	JOIN CTE AS S ON S.ROW = M.ROW

Now we have a nice mapping table which allows us to join the source and target tables together easily!

-- After the insert we can perform a join between source table and target table
SELECT M.*, S.*, T.*  
	FROM @MAPPING_TABLE AS M
	JOIN @TARGET_TABLE AS T ON T.ID = M.TARGET_ID
	JOIN @SOURCE_TABLE AS S ON S.ID = M.SOURCE_ID
	ORDER BY M.ROW ASC

The JOIN is clean and simple, and it will work with any kind of column type, as long as the values of the columns in the join uniquely identify a row in each table.

Note: SQL Server 2008 comes with a useful MERGE statement which can be used to achieve the same result with less lines of code.

Table-Valued Parameters in .NET

March 29th, 2011 No comments

SQL Server 2008 supports a new way of passing a variable number of arguments to functions and stored procedures.
The good thing for .NET developers is that those table-valued parameters are also usable from a managed context.
So when doing a mass join/mass update using SqlCommand, this approach offers a huge increase in performance when migrating from a line-by-line approach to a single statement that is to be executed.

The following example is to demonstrate a sample usage of this technique. We are assuming that we want to insert a set of Unicode strings into the database.

First of all, TVPs are only supported with custom types that must be created prior to execution.

CREATE TYPE dbo.MyType AS TABLE( MyColumn NVARCHAR(MAX) ); 
GRANT EXEC ON TYPE::dbo.MyType TO MyRole;

This is kind of a drawback if you are using a large number of different data types from different queries.

Here is how to pass data from an array to SQL Server:

using ( SqlConnection con = new SqlConnection( "connection string" ) )
{
    con.Open();
    using ( SqlCommand cmd = con.CreateCommand() )
    {
        string[] arr = { "a", "b", "c", "d" };
        SqlMetaData meta = new SqlMetaData( "MyColumn", SqlDbType.NVarChar, SqlMetaData.Max );
        var records = arr.Select( x => { SqlDataRecord r = new SqlDataRecord( meta ); r.SetString( 0, x ); return r; } );
 
        SqlParameter UrlParam = cmd.Parameters.AddWithValue( "@MyTVP", records );
        UrlParam.SqlDbType = SqlDbType.Structured;
        UrlParam.TypeName = "dbo.MyType";
        cmd.CommandText = "INSERT INTO MyTable ( SomeColumn ) SELECT MyAlias.MyColumn FROM @MyTVP AS MyAlias;"; 
        cmd.ExecuteNonQuery();
    }
}

In order to send the array data to the database server, we need to wrap it in an IEnumerable. This is then added to the parameters collection with the special type qualifier SqlDbType.Structured. On the SQL side, the handling is the same as a real table in the database, although it might be necessary to work with aliases when referencing the TVP.

Note that in this example we are limiting the insert to a single column. However you may also create TVPs with multiple columns and different data types.

This approach does not offer much savings when dealing with a few records to be passed to the server. However when you want to insert a few hundred records at once, executing hundreds of queries instead of a single one pays off.