got net?

Kevin Hazzard's Brain Spigot

About the author

Welcome to Kevin Hazzard's blog.
E-mail me Send mail

Recent posts

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010

Hampton Roads .NET User Group November 2009 Presentation

I presented a talk called "Enterprise Data Validation" at the Hampton Road .NET User Group this evening. The premise was simple. Data validation needs to happen in all the tiers of a modern application but the validation rules should be defined only once to avoid synchronization errors. In this talk, I showed how to extend SQL Server using extended properties to store regular expressions for data validation as column metadata. I also showed how to add a regular expression matcher to SQL Server using the SQL CLR and how to add check constraints to invoke the regular expression parser. Then I built a WCF service to query the validation metadata to make it available in other application tiers. I quickly assembled WCF service host and client showed how you could bring all of the elements together to create a working Enterprise Validation solution.

Download the SQL Scripts (20.06 kb)

Get the Slides

Attendees can Rate My Talk


Posted by kevin on Tuesday, November 10, 2009 11:37 PM
Permalink | Comments (0) | Post RSSRSS comment feed

Exploring SQL Azure

I've been working with SQL Azure for some time now and I really like it from a technology perspective. For the uninitiated, SQL Azure, formerly SQL Data Services (SDS), is Microsoft's foray into relational databases in "the cloud". Microsoft runs a special build of SQL Server that operates in a highly-available configuration on the servers in its Azure data centers. From a system administrator's standpoint, there are some radical differences between SQL Server and SQL Azure. How in the world are we going to live without the BACKUP command or the KILL STATS JOB command, after all? When designing SQL Azure, Microsoft took a long look at the list of features that had grown into SQL Server over time and realized that there were a lot of physicality features that had become baked into the T-SQL language that make no sense whatsoever running in a grid type configuration with thousands of other active databases. In my mind, this is a good thing because it forces Microsoft to think critically about what pyhsical and logical assets really make up the database from an administrator's perspective and from a developer's perspective. Companies like Teradata and Netezza have been thinking this way for some time now in an effort to make their database appliances much simpler to manage. I sincerely hope that some of what Microsoft is learning with SQL Azure creeps back into SQL Server 2011 (or whatever it will be called). If so, it will be a great thing for Microsoft and its customers. In the short term, these tradeoffs will make traditional database and network administrators feel off balance, though.

Developers, on the other hand, will find SQL Azure quite comfortable. Much to my surprise, even my NHibernate-based applications work in SQL Azure without modification. One caveat there: If you use the NHibernate Hbm2Ddl utility as part of a Domain-Driven Design (DDD) process, just watch out for the fact that SQL Azure does not support heap tables. Because of this, every SQL Azure table must have a clustered index so make sure that all the tables in your model have a primary key or at least have a clustered index. (I know, I know. Primary keys don't have to be implemented as clustered indexes in SQL Server but you get the point.) A full list of SQL Server unsupported T-SQL can be found on MSDN. And while the list might look really long, the average application is not likely to encounter a lot of problems running against a SQL Azure database. The T-SQL statements that I'll personally miss the most in SQL Azure? Those would be OPENQUERY, OPENXML, SELECT INTO and NEWSEQUENTIALID.

Database Administration

Let's begin by looking at a bit of administration. The SQL Azure portal is the best place to start. You must log in with your Windows Live ID and click on the link for your project page. The screen for managing your SQL Azure databases is rather sparse. I was expecting to see something like SQL Server Management Studio on the web. Perhaps the web interface will evolve to support more features over time. For now, it presents something like this:

I've masked parts of the screen shot to protect my privacy but you get the idea. The section at the top shows basic information about you SQL Azure "server" with an option to reset the administrator password. The section below shows two tabs labeled "Databases" and "Firewall Settings". The databases tab shown above allows you to create databases, see the connection strings required to connect to them or to drop them. The list shows the current size and size limit for each database. As of this writing, SQL Azure limits databases to 1GB or 10 GB maximum sizes. Hopefully, that will change in the future to allow much larger databases to exist in the cloud. I mean, why build a mesh or grid infrastructure for massive database scaling and limit it to 10 gigabytes? The size of the 3 databases shows them to be zero bytes in size because I truncated them before writing this article. After clicking on one of the radio buttons beside a database name and clicking the "Connection Strings" button, you'll see an AJAX popup that looks something like this:

The popup shows what the ADO.NET and ODBC connections strings would look like in an application configuration file. Notice in both connection strings that there's nothing special about SQL Azure. We can use the plain, old SQL Server Native Client 10.0 over TCP/IP to connect to the Azure database. But can just anyone across the Internet connect to your data? Of course not. Microsoft allows you to restrict connections by IP address ranges or from Windows Azure tasks that you may be running in the MicrosoftServices cloud. The Firewall Settings tab on the main screen is where we can do that. The Firewall Settings screen looks something like this:

Again, I've masked out my IP address in the Record Name that I created called VZW shown here. Since I work on the go using my EVDO card a lot, I need to change the IP Address Range every day, sometimes several times a day. There's a way to change the firewall rules through DML but I've yet to try that. I was thinking of writing a Windows Azure or .NET Services service that would have access via the MicrosoftServices checkbox shown above. I could call the service with another form of authentication to have it update my SQL Azure firewall rules automatically. Until I write that, I'll have to use this web-based console interface to set up the firewall rules. One thing I've noticed during the CTP is that the firewall rules don't take effect immediately. When my IP address changes and I make a rule change in the SQL Azure Firewall Settings, it may take up to 10 minutes to push that change to the firewall and execute it. When I'm ready to make a change, I simply press the "Edit Record" button and an AJAX popup that looks like this is rendered:

Notice that the firewall rule for SQL Azure allows you to specify a range of IP addresses, not just one. That would be handy to use if, for example, all of the addresses within a Class C IPv4 Address block were allowed to connect to the SQL Azure databases you manage. I didn't see any support for IPv6 addresses in the Firewall Settings but I'm supposing that Microsoft will have to support IPv6 in the rules in the future. Additionally, I'd expect to see some richer firewall rules type, e.g. the use of subnet masks to further refine the grant or denial rules and rules based on IPSEC/VPN configuration. For now, IP ranges are enough to get started. I could show you the screens for creating and dropping databases in the web-based console but it's really not all that interesting. Besides, we can do that using the SQLCMD command-line tool as shown in the Database Access section below.

Database Access

I admit it. I'm a UNIX hacker from pre-history so whenever I have the chance to master something at the command line, I jump at it. So when I heard that SQL Azure worked well with the SQLCMD line tool, it brought a grin to my face. Here's a screen shot of a Windows Powershell-based exchange between me and my SQL Azure server. Once again, I've blanked out some of my personal information but this time, it's color coded to help you understand what's important.

The first thing to take note of is that I've issued 4 commands here, number 2# through 5#. Command 2# connects to the master database: notice the -d master parameter at the end of the 2# command line? The 1> prompt means SQLCMD is waiting for input from me. I typed "CREATE DATABASE Blog" followed by Return then "GO" followed by return. Until I provide the "GO" statement, all of my commands would be batched on the client side. When the "GO" command is received, it's not sent to SQL Azure. Instead, it's a signal to the SQLCMD client to send the current batch to the remote server. When the results returned from command 2# show no error, the 1> prompt shows to indicate the start of a new batch. The exit command takes me back to PowerShell.

At this point, our new Blog database has been created in the cloud and we're ready to use it. So in command 3#, I typed in a somewhat lengthy table definition. When I used the "GO" command to execute the batch, however, I got an error saying that the remote host (SQL Azure) has closed the connection. I took too long to type the command so SQL Azure, being a good steward of resources like IP connections, dropped the connection. The fact is that the connection was probably already dropped after I entered the "CREATE TABLE" command at prompt 1> and before I entered the "GO" directive at prompt 2>. But because SQLCMD was batching my commands, it didn't sense that the connection had been dropped until it tried to send the batch to SQL Azure. I haven't found a way to make SQL Azure keep the connection open longer and I probably would advise against using that feature if it exists. IP connections are precious to any kind of server that needs to scale to large number of users. Forcing SQL Azure to keep the connection open longer so you can type is just a bad idea because it would severely impede scalability.

Fortunately, I don't have to type that long "CREATE TABLE" command in again. When I ran SQLCMD again in step 4#, I simply pressed the up arrow on my keyboard and it "remembered" the command. Tapping return and issuing the "GO" command, the table is successfully created in the new Blog database. In the same session, I then started to type an "INSERT" command to put some data into the new Article table. Again, I took too long to do it so you can see that the connection was again closed by SQL Azure before committing the batch. No worries, though. Rerunning SQLCMD and using the up arrow trick saved the day again. The values were inserted successfully on the second attempt.

Now let's talk a bit about those masked out values that I color-coded in the screen shot. The yellow masks are where my user ID is inserted. You can create users and logins using the "CREATE USER" and "CREATE LOGIN" commands in SQL Azure just as you can in SQL Server. Once you've done that, you can use the user IDs with access to a given database to do your work. The @ sign trailing the user ID in each SQLCMD is significant. For whatever reason, you must connect to SQL Azure with the user ID (-U) qualified at (@) your server name. The server name shows in the pink or salmon masks. Also notice that the pink/salmon masks show up again in each command in the server name (-S) section. Just remember to specify your SQL Azure server name after the @ in the -U parameter and again in the -S parameter. Also, the -S parameter must contain the Fully Qualified Domain Name (FQDN) of the server so that the SQLCMD tool can resolve to your SQL Azure server's IP address. Now, let's query the data.

Notice that in command 10# that I used SQLCMD's -q parameter to pass a query string to my SQL Azure Blog database. In this case, it's a SELECT statement that dumps the data that I inserted with command 4# earlier. The output format isn't so pretty but you can tell that the data matches what I inserted before. Since I used the -q (lowercase) parameter, I still have to use the "exit" command to leave the SQLCMD interpreter. In command 11#, I used the -q parameter again to DROP the Article table from the Blog database. And in command 12#, I used the DROP DATABASE statement to drop the Blog database from SQL Azure altogether. In this case, I had to specify the master database using SQLCMD's -d parameter. Also note that by using the -Q (uppercase) parameter in command 12#, the exit statement is implied so I didn't have to exit manually as I did in commands 10# and 11#. That's handy.

Coding to SQL Azure

If you are using ADO.NET or ODBC, the connection strings to your SQL Azure database can be obtained from the SQL Azure web console as shown earlier. Let's take a moment to dissect the ADO.NET connection string while we're on the subject. I'll only address the parts that need some special attention below:

Server=tcp:<server name>.database.windows.net; Database=<database name>; User ID=<user name>; Password=<password>; Trusted_Connection=false;

  • Server=tcp:<server name>.database.windows.net - this is the fully qualified domain name of your server, prefixed with the tcp: directive. This tells the SQL Server Native Client to use the TCP/IP protocol to connect to the FQDN that you specify. If your client is configured to prefer named pipes or some protocol over TCP/IP, the tcp: directive in the connection string tells it to skip directly to TCP/IP instead.
  • User ID=<user name> - unlike SQLCMD, when using ADO.NET, the user name does not have to include @<server name> as the suffix. Just the user name part will do.
  • Trusted_Connection=false - this may not be what you think. This directive doesn't mean that the connection won't be secure. Every SQL Azure Tabular Data Stream (TDS) connection is tunnelled through the Secure Sockets Layer (SSL). Instead, this directive means that we won't be using OS-driven authentication like NTLM or Kerberos.

You can construct a connection string in C# quite simply by using the SqlConnectionStringBuilder class as follows:

private const string UserName = "jrsamples";
private const string Password = "m1Nn1ep3@rL";
private const string ServerName = "br549.database.windows.net";
private const string DatabaseName = "Blog";

...

var connBuilder = new SqlConnectionStringBuilder
                   {
                       DataSource = ServerName,
                       InitialCatalog = DatabaseName,
                       Encrypt = true,
                       TrustServerCertificate = false,
                       UserID = UserName,
                       Password = Password
                   };

Notice that the Encrypt property is set to true in the connection string builder. This isn't strictly required because SQL Azure will force this value to true even if the client does not specify it. You should also note that SQL Azure does not accept connections on any TCP port other than 1433 at this time. So don't try to use a different port in the connection string builder or the connections using it will fail. When you're ready to use the SqlConnectionStringBuilder, invoke the ToString() method to get the full connection string back for use in your code. I'd show you some ADO.NET code here to do INSERT, UPDATE and DELETE but, to be honest, it would be pretty boring. Your ADO.NET code most likely won't have to be changed when moving from SQL Server to SQL Azure.

Product Availability

As of this writing, SQL Azure is still in CTP (Community Technology Preview) and not available for commercial use. The Microsoft Professional Developer Conference (PDC) coming up in mid-November 2009 is the time that's expected for commercial launch of the product. Right now, it appears as though Microsoft is going to limit databases in the SQL Azure cloud to 1GB or 10GB, so many larger-scale commercial applications may have to wait for a time when 100GB or larger databases may be ported. There's no guarantee that will ever happen but one has to assume that Microsoft, once it has gotten some commercial experience serving real customers, will open SQL Azure up to databases that can really show its capabilities.

Pricing

Check Microsoft's SQL Azure Pricing information page for details about cost and measurement.

Closing Thoughts

Microsoft's first attempt at putting SQL Server into the cloud is fairly impressive. And although SQL Server Management Studio can be used with SQL Azure, there are some known compatibility issues that make using command line tools safer for the time being. I'm betting on the fact that Microsoft will make some rich, GUI-based management tools available in due time. After all, Microsoft differentiated itself in the database space years ago by making network and database administrator jobs much easier through the use of great tools. Why wouldn't they continue that trend with SQL Azure? With respect to the query engine and the storage engine in SQL Azure, this first release is fairly strong. The fact that my NHibernate-based applications run without modification is impressive to say the least. If you're accustomed to writing lots of rich stored procedures that use every trick in SQL Server 2008's book, you may encounter some problems when using SQL Azure, though. There are many subtle changes and omissions in the implementation concerning those features that many of us consider to be on the periphery.

Will SQL Azure be a hit in the marketplace? Who knows? That's the big question now. There's little uptake on relational cloud databases in general so it remains to be seen if the popularity of SQL Server will translate into the cloud well. That will have a lot to do with pricing and Microsoft's target market which isn't fully understood just yet. Imagine a medium-sized company that would have to pay for server hardware and SQL Server 2008 Standard Edition plus the Client Access Licenses to make the system available. Then there are the environmental factors like power and cooling to consider. There's also hardware and software maintenance to add in and the people to manage it all at three nines of uptime per month. What's that worth per year? If Microsoft can convince business managers that it's a safe thing to do, that the development experience is great and that the pricing's right, SQL Azure could be quite popular in the marketplace. Only time will tell. Personally, I'm already thinking of clients who could benefit by shedding their servers in favor of cloud databases. I'm definitely going to start small, though, and work my way up. My clients who are spending between $2,000 and $3,500 US per server (Total Cost of Ownership) with less than 10GB of storage are the ones who could benefit the most by considering the move to SQL Azure.


Tags: , ,
Posted by kevin on Sunday, November 01, 2009 1:17 PM
Permalink | Comments (2) | Post RSSRSS comment feed

SQL UNIQUEIDENTIFIERs are Really Big Integers

I wrote a blog post called How SQL Server Sorts the UNIQUEIDENTIFIER Type and another one called Ordering the SQL UNIQUEIDENTIFIER Type Numerically Correct for Reporting a while back. As a result, I get a lot of e-mails from people struggling with UNIQUEIDENTIFIER values in Microsoft SQL Server. That's cool because I like helping other developers. The mistake that most people make when working with this data type is treating them like strings. However, UNIQUEIDENTIFIERS are absurd looking integers, really big ones. We show them in hexadecimal format to make them more compact which adds to their absurdness, I suppose.

As I demonstrated in my previous blog posts, SQL Server adds to the absurdity by making the readable version of UNIQUEIDENTIFIER values fundamentally different from their numerical handling. For example, a UNIQUEIDENTIFIER that reads as FFEEDDCC-BBAA-9988-6677-001122334455 in a SQL script will be treated as an integer that we humans would read from left to right as 00112233-4455-6677-8899-AABBCCDDEEFF. We expect the most significant digits of a number to appear on the left and the least significant digits to appear on the right. But SQL Server doesn't work that way. Here's some T-SQL code that will create a table called [TestValue] and populate it with some UNIQUEIDENTIFIER values.

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [dbo].[TestValue]
(
    [RowId] [INT] IDENTITY(1,1) PRIMARY KEY NOT NULL,
    [UID] [uniqueidentifier] NOT NULL,
    [ReadableUID] [nchar](36) NULL
)
GO

CREATE FUNCTION [dbo].[NumericallyCorrectUid]
(
    @uid UNIQUEIDENTIFIER
)
RETURNS NCHAR(36)
AS
BEGIN
    DECLARE @result NCHAR(36)
    SET @result = CONVERT(NCHAR(36), @uid)
    SET @result =
        SUBSTRING(@result, 25, 8)
        + N'-'
        + RIGHT(@result, 4)
        + SUBSTRING(@result, 19, 6)
        + SUBSTRING(@result, 17, 2)
        + SUBSTRING(@result, 15, 2)
        + N'-'
        + SUBSTRING(@result, 12, 2)
        + SUBSTRING(@result, 10, 2)
        + SUBSTRING(@result, 7, 2)
        + SUBSTRING(@result, 5, 2)
        + SUBSTRING(@result, 3, 2)
        + LEFT(@result, 2)
    RETURN @result
END
GO

CREATE TRIGGER [dbo].[trg_UpdateReadableUid]
   ON [dbo].[TestValue] AFTER INSERT
AS
BEGIN
    UPDATE [TV]
        SET [ReadableUid] = dbo.NumericallyCorrectUid([TV].[UID])
        FROM [dbo].[TestValue] AS [TV]
        JOIN inserted AS [I]
            ON [TV].[UID] = [I].[UID]
END
GO

INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
INSERT INTO [UUIDTest].[dbo].[TestValue] ([UID]) VALUES (NEWID())
GO

SELECT [RowId], [UID], [ReadableUID]
    FROM [dbo].[TestValue]
    ORDER BY [RowId]

The query at the end shows the UNIQUEIDENTIFIER values in the order that they were inserted. On my computer, they appear as follows. Please understand that on your computer you will get different values. If you didn't, we would have to remove the UNIQUE from the data type's name, wouldn't we?

RowId UID                                   ReadableUID
===== ====================================  ====================================
1     21321236-C387-4F81-83C5-201B3ECCFFC9  201B3ECC-FFC9-83C5-814F-87C336123221
2     4159FB16-F10C-4C03-AABD-6A6BBB092ABA  6A6BBB09-2ABA-AABD-034C-0CF116FB5941
3     0F4F2022-BAB4-411C-B66B-8C63167987B7  8C631679-87B7-B66B-1C41-B4BA22204F0F
4     5F326809-C47A-4149-AAA3-8E3F1C8419A2  8E3F1C84-19A2-AAA3-4941-7AC40968325F
5     31243180-AFB1-427A-A8D9-04EEA9866224  04EEA986-6224-A8D9-7A42-B1AF80312431
6     B1731F5E-13BA-4683-A846-020D7121FDEB  020D7121-FDEB-A846-8346-BA135E1F73B1
7     FABD2006-D8CF-44EE-8774-7D3052FF5A28  7D3052FF-5A28-8774-EE44-CFD80620BDFA
8     A4547257-E3C5-4EEF-ABC0-246D96DAE4A1  246D96DA-E4A1-ABC0-EF4E-C5E3577254A4
9     B03AF9E2-583F-44A6-B99D-169457FFA629  169457FF-A629-B99D-A644-3F58E2F93AB0
10    502FA784-5308-4A33-9F1A-36816860BB61  36816860-BB61-9F1A-334A-085384A72F50

The first thing to notice is that the UNIQUEIDENTIFIERs were inserted in what seems like random order. This is because I used the NEWID() function in my INSERT statements to generate the UNIQUEIDENTIFIER values. If I had used the NEWSEQUENTIALID() function instead, the values would have been in ascending order when sorted by the [RowId]. The second thing to take note of is that comparing the [ReadableID] version of each [UID] reveals the pattern I showed above. Namely, the bytes of the [ReadableID] represented as 00112233-4455-6677-8899-AABBCCDDEEFF show in each related [UID] in the order FFEEDDCC-BBAA-9988-6677-001122334455. Now, let's order the results differently:

SELECT [RowId], [UID], [ReadableUID] FROM [dbo].[TestValue]
    ORDER BY [UID]

RowId UID                                   ReadableUID
===== ====================================  ====================================
6     B1731F5E-13BA-4683-A846-020D7121FDEB  020D7121-FDEB-A846-8346-BA135E1F73B1
5     31243180-AFB1-427A-A8D9-04EEA9866224  04EEA986-6224-A8D9-7A42-B1AF80312431
9     B03AF9E2-583F-44A6-B99D-169457FFA629  169457FF-A629-B99D-A644-3F58E2F93AB0
1     21321236-C387-4F81-83C5-201B3ECCFFC9  201B3ECC-FFC9-83C5-814F-87C336123221
8     A4547257-E3C5-4EEF-ABC0-246D96DAE4A1  246D96DA-E4A1-ABC0-EF4E-C5E3577254A4
10    502FA784-5308-4A33-9F1A-36816860BB61  36816860-BB61-9F1A-334A-085384A72F50
2     4159FB16-F10C-4C03-AABD-6A6BBB092ABA  6A6BBB09-2ABA-AABD-034C-0CF116FB5941
7     FABD2006-D8CF-44EE-8774-7D3052FF5A28  7D3052FF-5A28-8774-EE44-CFD80620BDFA
3     0F4F2022-BAB4-411C-B66B-8C63167987B7  8C631679-87B7-B66B-1C41-B4BA22204F0F
4     5F326809-C47A-4149-AAA3-8E3F1C8419A2  8E3F1C84-19A2-AAA3-4941-7AC40968325F

Zero in on the [RowId] values first. They are out of order because we ordered by the [UID] instead. But glancing at the [UID] and thinking of them as integers, they don't look ordered either. Look at the first digit of the first two rows, digit B (value 11) certainly comes after 3 numerically. So, how could this be ordered properly? Now look at the [ReadableID] column. Lo, and behold, that column appears to be sorted in ascending fashion. But we didn't order by that column so what's going on here? Again, the way we see a UNIQUEIDENTIFIER as human beings and the way SQL treats these really large integers is quite different. And these differences persist even when we humans try to convey UNIQUEIDENTIFIERs as strings in our scripts. For example, if I wanted to use the ordering by [UID] shown in the last query and return the last five rows ([RowId] 10, 2, 7, 3 and 4) by using the greater than or equal operator, I could do it as follows:

SELECT [RowId], [UID], [ReadableUID] FROM [dbo].[TestValue]
    WHERE [UID] >= '502FA784-5308-4A33-9F1A-36816860BB61'
    ORDER BY [UID]

Which would return the last 5 rows of the query shown before as:

RowId UID                                   ReadableUID
===== ====================================  ====================================
10    502FA784-5308-4A33-9F1A-36816860BB61  36816860-BB61-9F1A-334A-085384A72F50
2     4159FB16-F10C-4C03-AABD-6A6BBB092ABA  6A6BBB09-2ABA-AABD-034C-0CF116FB5941
7     FABD2006-D8CF-44EE-8774-7D3052FF5A28  7D3052FF-5A28-8774-EE44-CFD80620BDFA
3     0F4F2022-BAB4-411C-B66B-8C63167987B7  8C631679-87B7-B66B-1C41-B4BA22204F0F
4     5F326809-C47A-4149-AAA3-8E3F1C8419A2  8E3F1C84-19A2-AAA3-4941-7AC40968325F

When using the UNIQUEIDENTIFIER for the first row I want, it's important to note that I have to use SQL Server's numerically significant format of 502FA784-5308-4A33-9F1A-36816860BB61 instead of the readable-as-integer 36816860-BB61-9F1A-334A-085384A72F50. In fact, if I were to try to use the version that allows me to read the value from left to right as an integer instead, I would get very different results. Try it for yourself to see what happens.


Categories: CapTech | Series | Software Dev | SQL
Posted by kevin on Saturday, October 24, 2009 4:27 PM
Permalink | Comments (0) | Post RSSRSS comment feed

PyTip: Avoid Using range() for Large Sequences

When iterating over a sequence of numbers in Python, the range() function is commonly used. However, the implementation of the range() function in Python 2.x instantiates each element in the sequence before the iteration begins. This is really costly from both memory and CPU perspectives when the desired range of numbers is large. Consider using the xrange() function instead which implements a Python generator to yield each number in the sequence as needed. Using xrange() instead of range() for large iterations can have a big, positive impact on your code. For example, in an application I was working on recently, replacing range() calls with xrange() boosted my performance from ~900,000 transactions per second to over 3,000,000. In Python 3.x, the range() function is supposed to be implemented as a generator but I haven't tested that to be true yet. Let me know if you have.


Posted by kevin on Monday, September 21, 2009 7:00 AM
Permalink | Comments (4) | Post RSSRSS comment feed

Dynamic Language Runtime Performance Demos

I spoke at the Charlottesville .NET User Group this week and at the Raleigh Code Camp. I cheated and did the same presentation to both groups. Call me lazy but, in the middle of planning our own Code Camp in Richmond, I really didn't want to prepare two separate talks. I did a talk back at CodeStock 2009 on a similar topic back in June 2009 but it's evolved a lot since then based on my own growth and understanding. You can find the code and slides below.

This talk takes about an hour and covers the basic concepts of metaprogramming and how the DynamicMetaObject class, the IDynamicMetaObjectProvider interface and the DynamicObject helper class work in the .NET Framework 4.0. The talk is very demo-centric, focusing on the code that support the concepts outlined in a few slides. The first demo focuses on helping you to understand the dispatch mechanism in the DLR and how you can implement your own classes that participate in the call binding process. It's a fairly simple fluent XML parser implemented using the DynamicObject helper class. While it's simplistic, it helps to introduce developers who are new to the DLR to the concepts of dispatch and binding in C# before diving into things Pythonic.

The second demo focuses on performance measurement. Having learned about the delegate and rule caching that goes on in the DLR's CallSite<T> and ActionBinder classes in the presentation, these tests show the effect of all that awesome work that Microsoft has done for us to enhance the performance of dynamic code. The result is that we see the caching yields about 5,000 times the performance of traditional static to dynamic language interoperability. Rather than the hundreds of transactions per second that we are accustomed to when doing dynamic language invocation from a context like C#, we now get millions of transactions per second. To put it all in context, the same transactions performed in pure C# are still about 4 times as fast as the DLR with delegate and rule caching. However, you have to admit that what the DLR provides today is pretty amazing. For many, this level of performance makes the use of languages like Python for scripting a large application feasible for the first time.

The question I most often get about this is, "When will the dynamic language performance be just as good as 'real' .NET code?" OK, so I take exception to the term 'real' there but I know what you all mean. I don't know the answer and it doesn't really matter that Ruby and Python performance be just as good as C#. Different languages have their own strengths and sometimes that includes performance. But when pure C# generates 6 million transactions per second, traditional thunking across to dynamic languages performs at a rate of several hundred per second, the fact that adaptive Inline Caching in the DLR helps me to yield close to 2 million transactions per second is pretty darned good. We're well on the way to my other favorite language, Python, becoming a first-class citizen in the .NET family. It takes education, of course, for people to know just how wonderful the DLR is which is the real purpose of my evangelism. I just love the DLR, if you can't tell already.

I am doing a Channel 9 geekSpeak next week on this topic and I'll refer to some of this material. And my talk at the October 2009 NoVA Code Camp will use an even further evolved version of this talk. Drop me a note and let me know what you think about this topic. Do you have success stories? Horror stories regarding dynamic language integration?

Slides for DLR Performance Talk September 2009 [PPTX] (184 kB)

Slides for DLR Performance Talk September 2009 [PDF] (680 kB)

Code for DLR Performance talk September 2009 [Updated 21 Sept 2009] (16 kB)


Posted by kevin on Saturday, September 19, 2009 4:20 PM
Permalink | Comments (2) | Post RSSRSS comment feed

How I Learned to Love Metaprogramming

UPDATED on 30 June 2009

I spoke at the CodeStock 2009 conference and I thought it would be helpful for the attendees and others to be able to download my code and slides. The title of my presentation was "How I Learned to Love Metaprogramming" and it concerns Dynamic Language Runtime architecture, performance of dynamic typing and Python to C# integration. The slides and source code are linked below. I will be giving this talk again in September at the Charlottesville .NET User Group meeting. Both of the demos require C# 4.0 which is available in Visual Studio 2010.

  • Demo One - shows how to do XML parsing using a fluent interface based on a DynamicObject derivation in C# 4.0
  • Demo Two - shows how the Level 0, 1 and 2 CallSite and ActionBinder caches perform. UPDATED: I added a demo on 30 June 2009 that shows how the DLR 0.9 compares by invoking dynamic code through the DLR hosting APIs, thereby bypassing the CallSite caching mechanisms. The results are very instructive, showing that the DLR's polymorphic inline caching can yield a 250000% increase in performance. You read that correctly: a two hundrend fifty thousand percent increase in performance.

Slides in PDF (Acrobat) format (688.49 kb) 
Slides in PPTX (PowerPoint 2007) format (639.23 kb) 
Demo One Source Code - MetaObjectPlay200905.zip (5.17 kb) 
Demo Two Source Code - PythonIntegration200906.zip (5.33 kb)


Posted by kevin on Saturday, June 27, 2009 8:40 AM
Permalink | Comments (2) | Post RSSRSS comment feed

Remembering the Sony Magic Link

Sony Magic Link PIC-1000

I admit that I've been a geek for a good long while now. 30 years ago, I was taking computers apart and soldering new things into them to experiment or to force them to suit my needs. Those were fascinating times in the computer industry to be sure. Operating systems were simple enough to be understood in their entirety by a single person. Software applications were also very simplistic. Kids in 2009 would LOL at our total lack of sophistication but we were enthralled to be able to move circle and squares sloppily about the screen to simulate bombs or gnomes or whatever we could imagine. We had to use our imagination more in those early days and it wasn't a bad thing per se. Poor graphics and poor sound weren't an impediment to us because we didn't know that they were bad. But we knew they could be better and that kept us going.

At the halfway point on my journey thus far, I bought a Sony Magic Link PIC-1000 handheld computer. It was 1994. I think I paid about $800 for it at the time. I remember the excitement I felt as I opened the package and powered it up. It ran the Magic CAP operating system from General Magic and it was a thing of pure joy, I'll tell you. The screen sporting 16 shades of gray was organized into a virtual desktop complete with drawers that you could open and close by tapping them. You could drag objects into and out of the drawers. It had a filing cabinet, an on screen keyboard and a graphical toolbar of contextual tool icons at the bottom of the screen. The built-in modem would allow you to connect an e-mail or fax provider to exchange messages and documents with others.

I used my Sony Magic Link every day for a couple of years and it taught me a lot about the value of good software design. Some of the applications were clunky but, for the most part, it was a very functional device. Sony positioned the Magic Link as an upscale alternative to Apple's Newton. Having used both of them, I believed the Magic Link to be a far superior device for getting things done. The screen was bigger. It did decent animations. It had a built-in modem and a PCMCIA card slot. And the sound was good for the period. The Newton was cool but it was a glorified notepad, in my opinion. There's no doubt that Apple studied the Magic Link with great care to learn from it. As an iPhone user today, I often think about how similar my iPhone is, in so many ways, to the seminal PDA from Sony. Many of the Magic Link's gestures and graphical metaphors live on in the iPhone today, with or without attribution from Apple. Microsoft didn't seem to learn much from it though as Windows Mobile 6.5 still feels like a desktop OS crammed into a device not so well designed to run it. The truth hurts, I know.

I'm pretty sure that I still have that old Magic Link lying around in the attic somewhere. As I sit here in the White Hawk Music Café waiting for my son to complete his guitar lesson, I bristle with some of that old excitement I felt in 1994, wondering if I find that old device in the attic, will it still work? Will it be as fun today as it was then? Is there something I can solder into it to make it do something fun and new? Some things never change, I suppose. And that's good, too. Never stop learning. That goes for you, too, Mr. Ballmer. Watch and learn.


Categories: CapTech | Fun
Posted by kevin on Wednesday, June 17, 2009 7:00 PM
Permalink | Comments (8) | Post RSSRSS comment feed

Fluent XML Parsing Using C#'s Dynamic Type Part 2

In part 1 of this article, I showed you how to create a fluent parser for XML documents as a dynamic type in C# 4. With that dynamic class, you can use the dot (.) and index ([]) operators to traverse any XML document with a very natural syntax. Being able to read XML fluently is great. But to be really useful, our dynamic class should allow the programmer to modify the XML document fluently as well. To do this, we'll need to modify the behavior of the overridden TryGetMember method to create missing XElements whenever they are referenced by the code. Here's the modified code for the TryGetMember method:

public override bool TryGetMember(
    GetMemberBinder binder, out object result)
{
    result = null;

    /* handle the Value and Count special cases */
    if (binder.Name == "Value")
        result = _elements[0].Value;
    else if (binder.Name == "Count")
        result = _elements.Count;
    else
    {
        /* try to find a named attribute first */
        var attr = _elements[0].Attribute(
            XName.Get(binder.Name));
        if (attr != null)
        {
            /* if a named attribute was found,
               return that NON-dynamic object */

            result = attr;
        }
        else
        {
            /* find the named descendants */
            var items = _elements.Descendants(
                XName.Get(binder.Name));
            if (items != null && items.Count() > 0)
            {
                /* prepare a new dynamic object with
                   the list of found descendants */

                result = new DynamicXml(items);
            }
        }
    }
    if (result == null)
    {
        /* not found, create a new element here */
        _elements[0].AddFirst( new XElement( binder.Name ) );
        result = new DynamicXml( _elements[0].Descendants().First() );
    }
    return true;
}

Notice near the bottom of the TryGetMember method that when the object named in the binder is not found, a new XElement is created and added as the first child of the current element. Then, a DynamicXml wrapper is created to continue the fluent chain to the next call as may be necessary. Since the specially handled "Value" member is used to get the value of a DynamicXml element, we also need to handle the "set" case for the times when the programmer attempts to assign a Value to an element. To do that, we need to override the TrySetMember method in the DynamicXml class like this:

public override bool TrySetMember(
    SetMemberBinder binder, object value)
{
    if (binder.Name == "Value")
    {
        /* the Value property is the only one that
            may be modified. TryGetMember actually
            creates new XML elements in this
            implementation */

        _elements[0].Value = value.ToString();
        return true;
    }
    return false;
}

With these two changes in place, we can write code to modify the underlying XML document, too. In the following code that exercises both of the code changes, notice that the first author of the second book has no middle name element in the source XML document. With the changes we've made to TryGetMember and TrySetMember, however, this isn't a problem at all. Look at the code that assigns the middle name value of "Lisa" to that author below.

var xml = "<books pubdate='2009-06-15'>" +
 "<book price='45.99' title='Open Heart Surgery for Dummies'>" +
  "<id isbn10='4389880339'/>" +
  "<authors>" +
   "<author>" +
    "<name>" +
     "<first>Mortimer</first>" +
     "<middle>Q.</middle>" +
     "<last>Snerdly</last>" +
    "</name>" +
    "<email address='mort@surgery.com'/>" +
   "</author>" +
  "</authors>" +
 "</book>" +
 "<book price='32.75' title='Skydiving on a Budget'>" +
  "<id isbn='2129034454'/>" +
  "<authors>" +
   "<author>" +
    "<name>" +
     "<first>Trudy</first>" +
     "<last>Freefall</last>" +
    "</name>" +
    "<email address='tfreefall@jump.com'/>" +
   "</author>" +
   "<author>" +
    "<name>" +
     "<first>Bernard</first>" +
     "<middle>M.</middle>" +
     "<last>Fallson</last>" +
    "</name>" +
    "<email address='bernie@airborne.com'/>" +
   "</author>" +
  "</authors>" +
 "</book>" +
"</books>";

dynamic dx = new DynamicXml(xml);
dx.book[1].authors.author[0].name.middle.Value = "Lisa";

foreach (dynamic b in dx.book)
{
    Console.WriteLine("----- Begin Book -----");
    Console.WriteLine("Price='{0}'", b.price.Value);
    Console.WriteLine("Title='{0}'", b.title.Value);
    Console.WriteLine("AuthorCount='{0}'", b.authors.author.Count);
    foreach (dynamic a in b.authors.author)
    {
        Console.WriteLine("   ---- Begin Author ----");
        Console.WriteLine("   EmailAddress='{0}'", a.email.address.Value);
        Console.WriteLine("   FirstName='{0}'", a.name.first.Value);
        Console.WriteLine("   MiddleName='{0}'", a.name.middle.Value);
        Console.WriteLine("   LastName='{0}'", a.name.last.Value);
        Console.WriteLine("   ----- End Author -----");
    }
    Console.WriteLine("------ End Book ------");
}

When information about the first author of the second book is dumped to the console, we see that it contains the middle name that was assigned in the fluent mutation of the XML document:

---- Begin Author ----
EmailAddress='tfreefall@jump.com'
FirstName='Trudy'
MiddleName='Lisa'
LastName='Freefall'
----- End Author -----

Nice and easy, right? That's enough for this installment. In the next article in this series, I'll add some error handling code and deal with "the XML attribute problem" as it will come to be known. Enjoy.


Tags: ,
Categories: C# | CapTech
Posted by kevin on Monday, June 15, 2009 6:06 AM
Permalink | Comments (1) | Post RSSRSS comment feed

Ordering the SQL UNIQUEIDENTIFIER Type Numerically Correct for Reporting

As a follow-up to my last post on How SQL Server Sorts the UNIQUEIDENTIFIER Type, I thought it would be useful to have a function that would reorder the bytes of UNIQUEIDENTIFIERS whenever I need to show them in numerically correct order. Here's the User-Defined Function (UDF) I wrote to do this:

-- =============================================
-- Author:      W. Kevin Hazzard
-- Create date: 14 June 2009
-- Description: Reorder the bytes of a
--              UNIQUEIDENTIFIER to show it as
--              a numerically correct string.
-- =============================================
CREATE FUNCTION [dbo].[NumericallyCorrectUid]
(
    @uid UNIQUEIDENTIFIER
)
RETURNS NCHAR(36)
AS
BEGIN
    DECLARE @result NCHAR(36)
    SET @result = CONVERT(NCHAR(36), @uid)
    SET @result =
        SUBSTRING(@result, 25, 8)
        + N'-'
        + RIGHT(@result, 4)
        + SUBSTRING(@result, 19, 6)
        + SUBSTRING(@result, 17, 2)
        + SUBSTRING(@result, 15, 2)
        + N'-'
        + SUBSTRING(@result, 12, 2)
        + SUBSTRING(@result, 10, 2)
        + SUBSTRING(@result, 7, 2)
        + SUBSTRING(@result, 5, 2)
        + SUBSTRING(@result, 3, 2)
        + LEFT(@result, 2)
    RETURN @result
END
GO

Invoking the UDF is easy. Here's a little script that demonstrates how to do it:

DECLARE @uid UNIQUEIDENTIFIER
SET @uid = N'EBC23DE8-DC16-4A8B-8E11-6B1509B0DAED'
PRINT @uid

DECLARE @correctUid NCHAR(36)
SELECT @correctUid = [<your DB name here>].[dbo].[NumericallyCorrectUid] (@uid)
PRINT @correctUid

This outputs the following text in the Messages window of SQL Server Management Studio:

EBC23DE8-DC16-4A8B-8E11-6B1509B0DAED
6B1509B0-DAED-8E11-8B4A-16DCE83DC2EB

This matches the octet ordering that I showed in the previous blog post. Of course, if you attempt to create a new UNIQUEIDENTIFER in SQL using the reordered string, the new value will not be equal to the orginal. So be careful to use the reordered string only for reporting or whenever you need to see the UNIQUEIDENTIFIER values in numerically correct order. Enjoy.


Tags:
Categories: CapTech | SQL
Posted by kevin on Sunday, June 14, 2009 11:51 AM
Permalink | Comments (0) | Post RSSRSS comment feed

How SQL Server Sorts the UNIQUEIDENTIFIER Type

When you read sorted UNIQUEIDENTIFIER values in the query results from Microsoft SQL Server, it's not immediately clear how they may be ordered. Alberto Ferrari wrote a blog post about this subject a while ago. However, I didn't find that Alberto's article highlighted the answer to my fundamental questions about UNIQUEIDENTIFIER sorting. Numbering the octets in the UNIQUEIDENTIFIER from left to right as 0..9 then A..F, what is the Most Significant Byte (MSB) to Least Significant Byte (LSB) ordering of the type from SQL Server's perspective?

You can really think of a UNIQUEIDENTIFIER like a really big integer: 128 bits or 16 bytes wide. However, when you look at the 32 hexadecimal digits of a UNIQUEIDENTIFIER expressed as a hyphenated string, it's not clear that SQL is treating this type like a number that we would read from left to right. For example, look at the following two UNIQUEIDENTIFIERs generated by SQL Server's NEWID() function:

  • EBC23DE8-DC16-4A8B-8E11-6B1509B0DAED
  • 4F899E16-9D3E-4EA6-8A32-749A3FCAD865

Do these look like large integers? Maybe if you took the hyphens out. If they were numbers, which of them would be the larger one? Reading left to right, anyone with a cursory understanding of hexadecimal notation might assume that the first value beginning with EB is larger than the one beginning with 4F. But that's not correct from SQL Server's perspective. The second one is the larger UNIQUEIDENTIFIER to SQL Server. To prove this, run the following query in SQL Server Management Studio:

WITH [UIDs] AS ( --                        0 1 2 3  4 5  6 7  8 9  A B C D E F
          SELECT [ID] = '0', [UID] = CAST('01000000-0000-0000-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '1', [UID] = CAST('00010000-0000-0000-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '2', [UID] = CAST('00000100-0000-0000-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '3', [UID] = CAST('00000001-0000-0000-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '4', [UID] = CAST('00000000-0100-0000-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '5', [UID] = CAST('00000000-0001-0000-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '6', [UID] = CAST('00000000-0000-0100-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '7', [UID] = CAST('00000000-0000-0001-0000-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '8', [UID] = CAST('00000000-0000-0000-0100-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = '9', [UID] = CAST('00000000-0000-0000-0001-000000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = 'A', [UID] = CAST('00000000-0000-0000-0000-010000000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = 'B', [UID] = CAST('00000000-0000-0000-0000-000100000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = 'C', [UID] = CAST('00000000-0000-0000-0000-000001000000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = 'D', [UID] = CAST('00000000-0000-0000-0000-000000010000' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = 'E', [UID] = CAST('00000000-0000-0000-0000-000000000100' AS UNIQUEIDENTIFIER)
    UNION SELECT [ID] = 'F', [UID] = CAST('00000000-0000-0000-0000-000000000001' AS UNIQUEIDENTIFIER)
)
SELECT [ID], [UID] FROM [UIDs] ORDER BY [UID] DESC

This query is based on the one that Alberto Ferrari showed in his blog. But this one goes after the answer that I'm interested in a bit more directly. The query creates a rowset that associates each distinct octet in UNIQUEIDENTIFIERs with an order identifier, 0..9 then A..F. Then, by ordering the results of the query by the the UNIQUEIDENTIFIER values, we should be able to tell which octets are more significant numerically than the others according to SQL Server. Here's what the results look like:

A 00000000-0000-0000-0000-010000000000
B 00000000-0000-0000-0000-000100000000
C 00000000-0000-0000-0000-000001000000
D 00000000-0000-0000-0000-000000010000
E 00000000-0000-0000-0000-000000000100
F 00000000-0000-0000-0000-000000000001
8 00000000-0000-0000-0100-000000000000
9 00000000-0000-0000-0001-000000000000
7 00000000-0000-0001-0000-000000000000
6 00000000-0000-0100-0000-000000000000
5 00000000-0001-0000-0000-000000000000
4 00000000-0100-0000-0000-000000000000
3 00000001-0000-0000-0000-000000000000
2 00000100-0000-0000-0000-000000000000
1 00010000-0000-0000-0000-000000000000
0 01000000-0000-0000-0000-000000000000

Do you see the pattern? Reading the IDs in the left column from top to bottom, we can see that the most significant octet is in the A position and the least significant one is at position 0. The pattern that represents the most significant to the least significant bytes reads as ABCDEF8976543210 which is not exactly what you might expect. Looking back at the new UNIQUEIDENTIFIER values shown above, now we know why EBC23DE8-DC16-4A8B-8E11-6B1509B0DAED is considered to be less than 4F899E16-9D3E-4EA6-8A32-749A3FCAD865 numerically by SQL Server. The 6B octet at position A in the first UNIQUEIDENTIFER is less than the 74 octet at the same position in the second value. In fact, if we were to reorder the all of the bytes in those two UNIQUEIDENTIFIERs left to right according to the way that SQL Server really "sees" them numerically, they would need to undergo these transformations:

  • EBC23DE8-DC16-4A8B-8E11-6B1509B0DAED would be seen as 6B1509B0DAED-8E11-8B4A-16DC-E83DC2EB
  • 4F899E16-9D3E-4EA6-8A32-749A3FCAD865 would be seen as 749A3FCAD865-8A32-A6E4-3E9D-169E894F

So, now we can read SQL Server UNIQUEIDENTIFIERs numerically. That will come in handy one day so tuck it into the back of your brain. Enjoy.


Tags:
Categories: CapTech
Posted by kevin on Friday, June 12, 2009 10:03 PM
Permalink | Comments (1) | Post RSSRSS comment feed