got net?

Kevin Hazzard's Brain Spigot

Josh Carlisle and His Best Friend Fred

clock August 7, 2008 21:45 by author kevin

Josh Carlisle spoke to the Richmond .NET User Group this evening on Sharepoint Development for ASP.NET Developers. He was carrying what looked like a flask of vodka with him which made me think, "This is a guy I've got to hang out with." It turned out to be Fred Bottled Water. Unfortunately, I didn't get a picture of Josh with Fred, but here are pics of both of them. Doesn't Josh look unusually happy? He swears it was water of "exceptional purity with a high degree of virginality." Yeah, right.

Josh Carlisle speaks to the Richmond .NET User Group on 7 August 2008

Josh's presentation was very good. He was a bit perplexed near the end because of SharePoint's pesky insistence on treating the term MasterPage differently from MasterPages (plural). Pfft! SharePoint is so picky like that. Anyway, we had a good time and Josh was just great. He really knows his stuff. He's welcome back to Richmond at any time. Maybe I could get him up here for Code Camp on October 4, 2008. We got to see Nas Ali, too, who travelled to Richmond with Josh, I think. Always good to see Nas. He confirmed with me that he will be speaking for us at the upcoming Code Camp. Nas is a good speaker and his talks are not to be missed.

Thanks again Josh for coming to Richmond!

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Making SQL and .NET SHA1 Hashes Match

clock August 7, 2008 13:37 by author kevin

A friend at SnagAJob.com came to me with an interesting problem today. He said that the HashBytes function in SQL Server was outputting different results from the HashAlgorithm.ComputeHash method in .NET. Here's a T-SQL script that hashes the URL to my blog.

DECLARE @data NVARCHAR(max)
SET @data = N'http://www.gotnet.biz/Blog'
SELECT HashBytes('SHA1', @data)

This script outputs 0x7FC8C5E43E9425C890AB96E660C86FC9CB077F4D as the hash value. The algorithm in C# attempting to do the same thing might look like this:

using System;
using System.Security.Cryptography;
using System.Text;

public class HashTest
{
    static void Main()
    {
        DoHash(new SHA1CryptoServiceProvider());
        Console.ReadLine();
    }

    private static void DoHash(HashAlgorithm algo)
    {
        var bytes = Encoding.UTF8.GetBytes(
            "http://www.gotnet.biz/Blog");
        var hash = algo.ComputeHash(bytes);
        Console.Write("{0} ", algo.GetType().Name);
        foreach (var b in hash)
            Console.Write("{0:X2}", b);
        Console.WriteLine();
    }
}

This code outputs 0x10397796345455fa6332db477972dc360b54ef2, a different hash value. Do you see the problem in the code? I didn't at first but it's simpler than you think.

The encoding that I used in the C# code is UTF8 which means the 8-bit Universal Character Set/UNICODE Transformation Format. That's a mouthful, isn't it? In .NET, the UTF8 encoding corresponds to Windows code page 65001 where each source character may map to between one and four characters in the encoded output. I used that encoding implicitly because in working with XML as often as I do, I'm accustomed to using the UTF8 encoding for nearly everything I do. My friend who posed the original question had done the same thing. However, in this case, it's a bad choice.

Looking at the T-SQL code above, notice that the data type for my string is NVARCHAR, that's UNICODE. And although all strings in .NET are stored in UNICODE and the UTF8 encoding is, as its name implies, just transforming the UNICODE to an 8-bit transportable format, the computed SHA1 hash on a UTF-8 encoded string in .NET is clearly not the same as SQL Server's result.

Playing around with some other transforms in the System.Text namespace, I discovered that by replacing the UTF8 encoding with the so-called Unicode encoding (or by switching the SQL data type to VARCHAR) makes the hash computations match between SQL and .NET in my example above. I capitalized Unicode as I did there quite deliberately because I am referring to the type in the System.Text namespace called UnicodeEncoding (which is available as the static Unicode property on the Encoding class) not the UNICODE standard.

In .NET, the Unicode encoding corresponds to Windows code page 1200 and goes by the familiar alias UTF-16. As that alias may imply, the.NET UnicodeEncoding uses a sequence of one or two 16-bit integers to represent each character in the original text. The results are easy to understand visually so I made the graphic shown here.

You can see that the contents of the byte stream from the two encodings is different. The UTF8 encoding strips the high order zero bytes for cultures where they are superfluous whereas the Unicode encoding preserves them. To sum up, when hashing NVARCHARs in SQL, the equivalent encoding to use in .NET code is the UnicodeEncoding. When hashing VARCHARs in SQL server, the matching .NET encoding is the UTF8Encoding.

 

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5


Search

Calendar

<<  August 2008  >>
SuMoTuWeThFrSa
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456

Archive

Tags

Categories


Blogroll

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in