Archive

Posts Tagged ‘Conversion’

Storing IP Addresses in SQL Server

August 30th, 2009 No comments

Efficient storage of IP addresses can save a lot of disk and memory space while at the same time increasing query performance. Also it enables you to further process data based on subnets and network ranges.

Any IPv4 address can be represented by 32 bits. The “dotted-decimal notation” – which is the most widely known format – is just one form of writing the four bytes separated by dots. In this string representation, an IP address will consist of 7 (x.x.x.x) to 15 (xxx.xxx.xxx.xxx) characters. When encoded with an 8-bit character set, the storage space required is therefore at least 15 bytes, in contrast to 4 bytes for the binary representation. In SQL Server 2008, storing a table with a single INT column requires about 13 bytes of storage per row, whereas storing an IP address in VARCHAR(15) format requires about 26 bytes per row. Storing the address in four TINYINT columns or in a single 4 byte BINARY column also results in the same storage size as INT.

Besides saving space, storing in binary format will also allow for more efficient and smaller indexing, and hence result in faster lookup times.

As an added benefit, the binary representation enables us to perform searches based on subnets and arbitrary network ranges which is very difficult and slow when using dotted-decimal notation. You can find a real-life example in this post: Geolocation of IP Addresses.

Storing IP addresses in binary (integer) format usually requires conversion from/to the dotted-decimal notation. So the key question is how to transparently handle conversion for external applications. For us, using user defined scalar functions (UDF) has proven to be the most efficient way. Since there is no unsigned 32-bit integer format in SQL Server, we are converting all addresses into a signed integer format which only requires 4 bytes of storage and allows for efficient indexing.

CREATE FUNCTION [dbo].[GetLongIP] ( @StringIP AS VARCHAR(15) ) RETURNS  INT
BEGIN 
 
DECLARE @LongIP AS INT
DECLARE @StartPos AS SMALLINT
 
SET @LongIP = 0
SET @StartPos = 1
 
IF LEN( @StringIP ) > 15 OR LEN( @StringIP ) < 7     RETURN 0 ELSE BEGIN     IF CHARINDEX( '.', @StringIP, @StartPos ) > 1
    IF ISNUMERIC( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) ) = 1
    IF CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT) <= 0xFF      IF CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT ) >= 0x00
    BEGIN
        SET @LongIP = (CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT ) - 0x80 ) * 0x01000000
        SET @StartPos = CHARINDEX( '.', @StringIP, @StartPos ) + 1
    END
 
    IF CHARINDEX( '.', @StringIP, @StartPos ) > 1
    IF ISNUMERIC( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) ) = 1
    IF CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT) <= 0xFF      IF CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT ) >= 0x00
    BEGIN
        SET @LongIP = @LongIP + CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT ) * 0x00010000
        SET @StartPos = CHARINDEX( '.', @StringIP, @StartPos ) + 1
    END
 
    IF CHARINDEX( '.', @StringIP, @StartPos ) > 1
    IF ISNUMERIC( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) ) = 1
    IF CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT) <= 0xFF      IF CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT ) >= 0x00
    BEGIN
        SET @LongIP = @LongIP + CAST( SUBSTRING( @StringIP, @StartPos, CHARINDEX( '.', @StringIP, @StartPos ) - @StartPos ) AS INT ) * 0x00000100
        SET @StartPos = CHARINDEX( '.', @StringIP, @StartPos ) + 1
    END
    IF ISNUMERIC( SUBSTRING( @StringIP, @StartPos, LEN( @StringIP ) - @StartPos + 1 ) ) = 1
    IF CAST( SUBSTRING( @StringIP, @StartPos, LEN( @StringIP ) - @StartPos + 1 ) AS INT) <= 0xFF      IF CAST( SUBSTRING( @StringIP, @StartPos, LEN( @StringIP ) - @StartPos + 1 ) AS INT ) >= 0x00
        SET @LongIP = @LongIP + CAST( SUBSTRING( @StringIP, @StartPos, LEN( @StringIP ) - @StartPos + 1 ) AS INT )
END
    RETURN @LongIP
END

The above looks a bit funky, but it does the job. There is also a hack using the PARSENAME system function, which should be mentioned as well. As an added benefit, the following UDF is more than 3 times faster than the above.

CREATE FUNCTION [dbo].[GetLongIP] ( @StringIP AS VARCHAR(15) ) RETURNS  INT
BEGIN
    RETURN
    (
    ( CAST( PARSENAME( @StringIP, 4 ) AS INT ) - 0x80 )    * 0x01000000 +
      CAST( PARSENAME( @StringIP, 3 ) AS INT )            * 0x00010000 +
      CAST( PARSENAME( @StringIP, 2 ) AS INT )            * 0x00000100 +
      CAST( PARSENAME( @StringIP, 1 ) AS INT )
    )
END

Converting IP addresses from signed integer format to dotted-decimal notation is straightforward:

CREATE FUNCTION [dbo].[GetStringIP] ( @IP AS INT ) RETURNS VARCHAR(15)
BEGIN
    RETURN
    (
    CAST( (@IP & 0xff000000)/0x01000000 + 0x80    AS VARCHAR ) + '.' +
    CAST( (@IP & 0x00ff0000)/0x00010000        AS VARCHAR ) + '.' +
    CAST( (@IP & 0x0000ff00)/0x00000100        AS VARCHAR ) + '.' +
    CAST( (@IP & 0x000000ff)                AS VARCHAR )
    )
END

Both UDFs are compatible with SQL Server 2000 and newer, and have been used in a production environment for almost a decade.

When using SQL Server 2000, you may also consider to use an extended stored procedure (XP) for this purpose. An XP is a dynamic link library written in native C. This dll could make use of the itoa() and atoi() APIs or use some custom parsing code. But unless you need to pull the last bit of performance out of your system, using an XP for this purpose is discouraged due to potential security implications.

With SQL Server 2005/2008 extending functionality has become much easier and less of a security concern due to the introduction of managed CLR functions that can be invoked in the same way as a UDF.  In case you are interested, here is some C# code to do the job:

private static int GetLongIP( string s )
{
    string[] ss = s.Split( new char[] { '.' }, 4 );
    return ( ( Int32.Parse( ss[ 0 ] ) - 0x80 ) << 24 ) | ( Int32.Parse( ss[ 1 ] ) << 16 ) | ( Int32.Parse( ss[ 2 ] ) << 8 ) | Int32.Parse( ss[ 3 ] );
}
 
private static string GetStringIP( int i )
{
    return String.Format( "{0}.{1}.{2}.{3}", ( i >> 24 ) + 0x80, ( i >> 16 ) & 0xFF, ( i >> 8 ) & 0xFF, i & 0xFF );
}

This will run within a SAFE (default) permission set, since it does not reference any additional name spaces or libraries. You might be tempted to save even more work by calling member methods of the IPAddress class in the System.Net namespace, but that would require your code to run with EXTERNAL_ACCESS or UNSAFE permission set.

Testing for possible Unicode – ANSI code page compatibility

May 13th, 2009 9 comments

When dealing with a recent ExifTool remoting task, there was a question whether or not a given Unicode file name could be safely represented in the system ANSI code page. Only if the file name was fully convertible it could be passed to the application directly.

In case the file name cannot be converted to the current code page, an application which does not utilize the CreateFileW() API will not be able to open the file with this name. In case the file system supports old style DOS 8.3 filenames, the application should resort to using those instead.

BOOL IsConvertibleText( PCWSTR sFile )
{
    BOOL bRet = FALSE;
    if ( sFile )
    {
        int iBuffer = WideCharToMultiByte( CP_ACP, 0, sFile, -1, NULL, 0, NULL, NULL );
        if ( iBuffer != 0 )
        {
            iBuffer += 1;
            PSTR a = (PSTR)HeapAlloc( GetProcessHeap(), 0, iBuffer );
            if ( a )
            {
                if ( WideCharToMultiByte( CP_ACP, 0, sFile, -1, a, iBuffer, NULL, NULL ) )
                {
                    iBuffer = MultiByteToWideChar( CP_ACP, 0, a, -1, NULL, 0 );
                    if ( iBuffer != 0 )
                    {
                        iBuffer = ( iBuffer + 1 ) * sizeof(WCHAR);
                        PWSTR w = (PWSTR)HeapAlloc( GetProcessHeap(), 0, iBuffer );
                        if ( w )
                            if ( MultiByteToWideChar( CP_ACP, 0, a, -1, w, iBuffer ) )
                                if ( CompareStringW( LOCALE_SYSTEM_DEFAULT, 0, sFile, -1, w, -1 ) == CSTR_EQUAL )
                                    bRet = TRUE;
                        HeapFree( GetProcessHeap(), 0, w );
                    }
                }
                HeapFree( GetProcessHeap(), 0, a );
            }
        }
    }
    return bRet;
}

For those using C#:

bool IsConvertibleText( string sFile )
{
    byte[] b = Encoding.Default.GetBytes( sFile );
    string s = Encoding.Default.GetString( b );
    return sFile.Equals( s, StringComparison.InvariantCulture );
}

See also: Post in CPAN::Forum
Win32API::File Unicode support bug