Archive

Archive for the ‘SQL’ Category

Visual Studio 2012 Entity Data Model Wizard Annoyance

November 13th, 2012 No comments

Just stumbled upon an annoyance in VS 2012 Entity data model wizard. I had implemented a class and then decided to add an Entity Model to the project. Since the model was supposed to implement storage functionality for the previously written class, it was named the same as the .cs file of that class (except the .cs extension).
It turned out that this setting will silently overrwrite the .cs file with an entity framework generated class.

T-SQL Capitalize First Character in Each Word

July 26th, 2012 No comments

This is a small exercise in T-SQL for a common situation: It happens quite frequently that data is entered manually in slightly different ways regarding upper and lower case. Examples are “james smith”, “Gerald BLACK”, “MIKE O’NEILL” etc.

Normalizing such input can be done in a procedural way by iterating over each record with a couple of variables and two loops.

Is it possible to normalize such input using T-SQL in a non-procedural manner? It is, though I would leave it up to the reader to judge whether or not it is more efficient.

WITH CTE1 AS(
	SELECT 0 AS ID, N'章子怡' AS A UNION ALL
	SELECT 1, 'new york' UNION ALL
	SELECT 2, 'michael BOND' UNION ALL
	SELECT 3, 'DAN john o''keele' UNION ALL 
	SELECT 4, 'michaela dÜmpel-gerümpel'	UNION ALL
	SELECT 5, 'mieze katze luftMatratze' UNION ALL
	SELECT 6, 'gute grütze gedankenStütze'
), CTE2 AS (
	SELECT ROW_NUMBER() OVER( ORDER BY ID ) AS ID1, * FROM CTE1
), CTE3 AS (
	SELECT ID1, 1 AS ID2, SUBSTRING( A, 1, 1 ) AS C, A FROM CTE2
	UNION ALL 
	SELECT ID1, ID2 + 1, SUBSTRING( A, ID2 + 1, 1 ), A FROM CTE3 WHERE ID2 < LEN( A )
), CTE4 AS (
	SELECT A.ID1, A.ID2, A.A, CASE WHEN B.C IS NULL THEN UPPER( A.C ) ELSE LOWER( A.C ) END AS C FROM CTE3 AS A
		LEFT JOIN CTE3 AS B ON A.ID1 = B.ID1 AND A.ID2 - 1 = B.ID2 AND B.C NOT IN ( ' ', '.', ',', '[', '(', '''', '-' )
), CTE5 AS (
	SELECT CTE2.ID1, ( SELECT CTE4.C + '' FROM CTE4 WHERE CTE4.ID1 = CTE2.ID1 ORDER BY CTE4.ID2 FOR XML PATH(''), TYPE ).VALUE( 'text()[1]', 'NVARCHAR(MAX)' ) AS A
		FROM CTE2
)
SELECT * FROM CTE5 ORDER BY ID1 ASC OPTION( MAXRECURSION 32767 )

Here is the input:

0 章子怡
1 new york
2 michael BOND
3 DAN john o’keele
4 michaela dÜmpel-gerümpel
5 mieze katze luftMatratze
6 gute grütze gedankenStütze

This becomes:

1 章子怡
2 New York
3 Michael Bond
4 Dan John O’Keele
5 Michaela Dümpel-Gerümpel
6 Mieze Katze Luftmatratze
7 Gute Grütze Gedankenstütze

Obviously the above code won’t work if your name is MacGuyver, DiRosa or McDonalds. But it would not take much more than another CTE to fix such names as well as long as they adhere to a common capitalization scheme.

Categories: Software Development, SQL Tags:

T-SQL Word Randomizer – Does the Order of Letters in a Word Matter?

May 19th, 2012 No comments

Perhaps you have come across this paragraph before:

Acdrocing to a rcersaheer at Cdmiabrge Uterniisvy, it dens’ot mttear in waht odrer the letters in a word are, the only inapmtrot tinhg is that the frist and last lteetr be at the rghit plcae. The rset can be a ttoal mess and you can siltl raed it wtihuot plreobm. Tihs is baesuce the hmuan mind does not raed ervey lteetr by ilestf but the wrod as a wlohe.

When I saw this, I thought about how to write such a randomizer in T-SQL without using loops and variables:

WITH CTE1 AS(
	SELECT 'According to a researcher at Cambridge University, it doesn''t matter in what order the letters in a word are, the only important thing is that the first and last letter be at the right place. The rest can be a total mess and you can still read it without problem. This is because the human mind does not read every letter by itself but the word as a whole.' AS A
	UNION ALL
	SELECT 'Manche Leute sagen es spielt keine Rolle, in welcher Reihenfolge die Buchstaben in einem Wort angeordnet sind, solange der erste und der letzte Buchstabe nicht verändert wird.'
), CTE2 AS (
	SELECT ROW_NUMBER() OVER( ORDER BY LEFT( A, 0 ) ) AS ID1, * FROM CTE1
), CTE3 AS (
	SELECT ID1, 1 AS ID2, SUBSTRING( A, 1, 1 ) AS C, A FROM CTE2
	UNION ALL 
	SELECT ID1, ID2 + 1, SUBSTRING( A, ID2 + 1, 1 ), A FROM CTE3 WHERE ID2 < LEN( A )
), CTE4 AS (
	SELECT A.ID1, A.ID2, A.A, A.C, CASE WHEN B.C IS NOT NULL AND C.C IS NOT NULL THEN 0 ELSE 1 END AS ID3 FROM CTE3 AS A
		LEFT JOIN CTE3 AS B ON A.ID1 = B.ID1 AND A.ID2 - 1 = B.ID2 AND B.C NOT IN ( ' ', '.', ',', '[', '(', ']', ')' )
		LEFT JOIN CTE3 AS C ON A.ID1 = C.ID1 AND A.ID2 + 1 = C.ID2 AND C.C NOT IN ( ' ', '.', ',', '[', '(', ']', ')' )
), CTE5 AS (
	SELECT A.ID1, A.ID2, A.A, A.C, CASE A.ID3 WHEN 0 THEN ( SELECT MAX( B.ID2 ) + 1 FROM CTE4 AS B WHERE A.ID1 = B.ID1 AND B.ID2 < A.ID2 AND B.ID3 = 1 ) ELSE ID2 END AS ID3 FROM CTE4 AS A
), CTE6 AS (
	SELECT CTE2.ID1, ( SELECT CTE5.C + '' FROM CTE5 WHERE CTE5.ID1 = CTE2.ID1 ORDER BY CTE5.ID3, NEWID() FOR XML PATH(''), TYPE ).VALUE( 'text()[1]', 'NVARCHAR(MAX)' ) AS A
		FROM CTE2
)
SELECT * FROM CTE6 ORDER BY ID1 ASC OPTION( MAXRECURSION 32767 )

Here is a German example:

Mahnce Letue seagn es speilt kenie Rolle, in wcelher Renfhioglee die Bbetschuan in eniem Wort aegdnonret snid, solnage der erste und der ltteze Bsbctauhe nihct vedrreänt wrid.

Ok, this is not terribly efficient, but it proves that recursive CTEs have their right of existence at last ;-)