Dictionary Classes Benchmarked

by kevin 7/13/2008 8:30:00 PM

A few days ago, I stumbled on an article by Amit Raz about the SortedList<K,T> on Dev102.com. In the article, which compares and contrasts the SortedList collection class in the .NET BCL to the SortedDictionary class, Amit concludes with, "So what is the SortedList good for? Beats me. I deem it useless." His conclusion seemed to be predicated on the fact that the Add() method would throw an exception if the programmer attempted to insert a duplicately keyed entry into a SortedList. However, this is documented behavior. And both the SortedList and the SortedDictionary exhibit that same behavior. An indexer exists on each of those collections that will allow the insertion of a duplicately keyed item. For both classes, duplicates replace the original values when using the indexer.

I had a difficult time following Amit's logic but he's a bright fellow so I wanted to find out if the SortedList really was useless as he had proclaimed. On this page, Microsoft provides a table that describes the benefits and relative drawbacks of these two seemingly similar classes. Long story short, the key advantages of the SortedList<K,T> are that it uses less memory than the SortedDictionary<K,T> and that some of its members that return keys and values are indexed. OK, so the SortedList uses less memory. But what's that latter claim about?

Well, the SortedList<K,T> has a few members that the SortedDictionary<K,T> does not. Among them are:

  • int IndexOfKey( K key )
  • int IndexOfValue( T value )

Kicking around in this class in Reflector, one sees that the internal storage for the SortedList<K,T> is a pair of arrays: one for the keys which is kept in sorted order and one for the values which is kept in insertion order. When using the IndexOfKey method noted above, Array.BinarySearch() is used to perform an efficient search for the desired key. However, the IndexOfValue method uses a brute force (O(n)) scan to find the requested value. Another special case advantage of the SortedList is that insertions are O(1) for data inserted in sorted order whereas for the SortedDictionary, the average insertion cost is around O(log n). In fact, for data already in sorted order, the SortedDictionary pays a bit of a penalty on insertion because of the required balancing of the tree structure used to store the information. More on that later. So, if you have small lists and the keys are already sorted before insertion, the SortedList might be a good choice. If your keys are not sorted, insertion into a SortedList could be as bad as O(n). So it may be that you have to know something about your data to use the SortedList in a way that makes it worthwhile. Not understanding your data, could make the SortedList worthless, as Amit claims.

I wrote a small test harness that exercises the SortedList<K,T> and SortedDictionary<K,T>. I've included a link to the source code below. The application runs a battery of tests using a small list of 1,000 items on each type of dictionary and the same battery using a large list of 40,000 items. The main window looks like this:

These results show an average test run on my Windows Server 2008 machine using the .NET Framework 3.5. For the small list of 1,000 items, the SortedList seems to be a bit more efficient than the SortedDictionary with respect to time, despite the fact that the keys are not in sorted order for that test. However, when it comes to the larger list of 40,000 items, the SortedDictionary is the clear winner for both insertions and removals. But what about memory? Remember that Microsoft's MSDN topic said that the SortedList can be more memory efficient? I put 10 seconds of time in between the 4 test groups shown on the screen shot above and ran the .NET memory profiler to see what was happening. It's not all that conclusive, in my opinion. Here's a graphic showing the Gen 0, 1 and 2 heap sizes over the lifetime of the test. Perhaps you can help me analyze what you see:

There is a very large spike during the third test in the Gen 1 heap size when the large data set of 40,000 values in placed into the SortedList. Most of that newly allocated memory seems to move to the Gen 2 heap when the last test kicks off, returning the Gen 1 heap almost back to it's former value. The internal implementation of the SortedDictionary is based on a private, internal class in System.Collections.Generic called TreeSet<T>. In an upcoming blog post, I will be examining the TreeSet<T> class in detail. It uses a special kind of binary tree implementation known as a red-black tree. Why Microsoft didn't expose this incredibly cool class, I don't understand. So I suppose I should do that, right? Until next time...

Source Code for the DictionaryTestHarness Application (6KB)

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

BCL | C# | Debugging | Software Development

Accessing Web Services from Silverlight 2

by kevin 7/10/2008 10:39:00 PM

I presented tonight (10 July 2008) to the Richmond .NET User Group. We had a pretty good turnout, I'm guessing 40 to 45 developers. I gave this same presentation at my office today as a dry run and as a training opportunity within the company. It's so good to see the developer community eager to learn. I've attached my slides and the three demonstrations projects I used in this post. I'll be giving this same presentation to the Charlottesville .NET User Group next Thursday (17 July 2008). The abstract we put on both user group websites follows:

Silverlight is a client-side technology. So it’s not really a part of your SOA strategy, right? You may want to think twice about that. SOAP and WSDL support are coming to the web desktop via Silverlight. And Silverlight has good client support for REST+ JSON/POX and RSS/ATOM-based web services, too. During this discussion, we’ll dive into data serialization, security and cross-domain access policy capabilities inside Silverlight 2 Beta 2. We also talk about the nuances and pitfalls of provisioning your web services for an Internet audience. This presentation will be heavy on coding, demonstration and interactive discussion.

Powerpoint Presentation (289KB)

Twitter solution showing how to invoke a cross-domain RESTful service by way of an in-domain SOAP service bypassing the cross-domain access policy problem. (842KB)

REST solution showing how to create RESTful services in WCF and how to consume RESTful services in Silverlight (307KB)

Silverlight syndication solution showing how to consume cross-domain RSS and Atom feeds using the SyndicationFeed class. (11KB)

Efficient Paging in SQL Server via LINQ

by kevin 7/6/2008 12:00:00 PM

UPDATE: I've included a videocast with this blog post. Let me know what you think. 

A few days ago, my buddy Justin Etheredge wrote a blog post about Efficient Paging in SQL Server. I was thinking about how transparent Language Integrated Query (LINQ) makes paging and I thought I'd blog about it. Two of the more interesting extension methods offered by LINQ are Skip() and Take(). You can use these extension methods to skip rows at the beginning of the query result and take only those you want to return. Sounds like paging to me. I wonder if Skip() and Take() used in combination with LINQ to SQL behave as efficiently as Justin's example? Let's take a look. Consider the following LINQ query:

var db = new AdventureWorksDataContext();
var query = from p in db.SalesOrderHeaders
  where p.SalesTerritory.Name.Equals( "Northeast" )
  select new {
    p.Contact.FirstName,
    p.Contact.LastName,
    TotalSales = p.SalesOrderDetails.Sum(
      o => o.OrderQty * o.UnitPrice )
};

This small example uses the AdventureWorks SalesOrderHeaders as the input sequence and shapes the output sequence to include the associated Contact's name parts and the total value of each order. The total value is computed as the OrderQty times the UnitPrice for each associated item in the SalesOrderDetails table. There is a filter placed on the query to restrict the results to orders placed in the 'Northeast' territory. This simple query shows how easy it is to filter, perform arithmetic and use the relationships in a LINQ to SQL data context to traverse table relationships. What does this query look like when it's compiled for execution on SQL Server?

SELECT
  [t2].[FirstName],
  [t2].[LastName],
  (
    SELECT SUM([t4].[value])
    FROM
    (
        SELECT
          (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
          [t3].[SalesOrderID]
        FROM [Sales].[SalesOrderDetail] AS [t3]
    ) AS [t4]
    WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
  ) AS [TotalSales]
FROM [Sales].[SalesOrderHeader] AS [t0]
LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
  ON [t1].[TerritoryID] = [t0].[TerritoryID]
INNER JOIN [Person].[Contact] AS [t2]
  ON [t2].[ContactID] = [t0].[ContactID]
WHERE [t1].[Name] = @p0

You can see the territory filter applied as a WHERE clause. Note that even when a string literal is used in the C# code, LINQ to SQL still passes filtering variables as parameters. In this case, the territory name 'Northeast' is passed as a variable named @p0. This is always a good practice because it helps to thwart the injection of potentially malicious T-SQL into your query. We can see another interesting feature of LINQ to SQL in the T-SQL that is created called projection. Because the C# code shown above shapes the output sequence to only a few required columns, the LINQ to SQL engine is smart enough to T-SQL shape the query to return only what's needed. Projection often improves query performance and always improves transportation speed on the wire.

Finally, notice that the third column projected into the output sequence, i.e. the sum of each order's value, is instatiated as a two-part, nested sub-SELECT operation in the T-SQL statement. The inner SELECT does the math on the order quantity and price. The containing SELECT aggregates the line item totals and them filters them to the rows selected by the outer query. Nicely done, LINQ! Now, we see that this is a long list, returning thousands of rows. If this query is meant for human consumption, we should break it into smaller chunks to make it easier to handle. How do we do that in LINQ? Add this to the C# code shown before.

var _pageNum = 3;
var _pageSize = 20;
query = query.Skip((_pageNum - 1) * _pageSize).Take(_pageSize);

This modification uses the Skip() and Take() extension methods to skip 40 rows and take the next 20 rows. In other words, at 20 results per page, this query now returns the 3rd page. Though the magic of deferred execution, we can add the Skip() and Take() extentions at any time before we begin iterating over the result set. This comes in handy when you want to enable paging for human consumption but to disable it for B2B or ETL scenarios. Is the paged T-SQL query shown here efficient though? You tell me. Here the T-SQL that is produced:

SELECT
  [t6].[FirstName],
  [t6].[LastName],
  [t6].[value] AS [TotalSales]
FROM
(
  SELECT
    ROW_NUMBER() OVER
    (
      ORDER BY
        [t5].[FirstName],
        [t5].[LastName],
        [t5].[value]
    ) AS [ROW_NUMBER],
    [t5].[FirstName],
    [t5].[LastName],
    [t5].[value]
  FROM
  (
    SELECT
      [t2].[FirstName],
      [t2].[LastName],
      (
        SELECT
          SUM([t4].[value])
        FROM
        (
          SELECT
            (CONVERT(Decimal(29,4),[t3].[OrderQty])) * [t3].[UnitPrice] AS [value],
            [t3].[SalesOrderID]
          FROM [Sales].[SalesOrderDetail] AS [t3]
        ) AS [t4]
        WHERE [t4].[SalesOrderID] = [t0].[SalesOrderID]
      ) AS [value], [t1].[Name]
      FROM [Sales].[SalesOrderHeader] AS [t0]
      LEFT OUTER JOIN [Sales].[SalesTerritory] AS [t1]
        ON [t1].[TerritoryID] = [t0].[TerritoryID]
      INNER JOIN [Person].[Contact] AS [t2]
        ON [t2].[ContactID] = [t0].[ContactID]
    ) AS [t5]
    WHERE [t5].[Name] = @p0
  ) AS [t6]
WHERE [t6].[ROW_NUMBER] BETWEEN @p1 + 1 AND @p1 + @p2
ORDER BY [t6].[ROW_NUMBER]

If you read the query from the inside out, you'll see that on the inside it's essentially the same query that we saw before we added the paging feature. It has all the original SELECTs named t1 through t4 and is wrapped as a new result called t5. The SQL Server ROW_NUMBER() function is used to inject a row number into t5 ordered by all 3 projected columns. That looks a lot like the query Justin showed us in his blog post. Very efficient! The new result containing the row numbers is named t6.

Finally, the t6 result is filtered by a starting row number and ending row number using two new variables @p1 and @p2. For page 3 paged in 20 row chunks as shown above, these variables would have the values 40 and 59, respectively. LINQ to SQL injects these starting and ending row number parameters whenever you use Skip() and Take() together. Well, it almost always does that. If you happen to specify Skip(0), it reverts to the behavior that Take() uses without Skip() which is to use SQL Server's TOP() function instead. LINQ to SQL sure knows how to sweet talk SQL Server, don't you think?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , , ,

C# | LInQ | ORM | Software Development | SQL Server | SQL Server 2008

Software Development Meme

by kevin 6/30/2008 8:30:00 PM

Frank La Vigne called me out on the Software Development Meme. Michael Eaton started this one. The path to me is:

Michael Eaton >> Dan Rigsby >> Chad Campbell >> Pete Brown >> Frank La Vigne >> Kevin Hazzard 

Here's my response:

How old were you when you first started programming?

I got a hand-me-down TRS-80 Model II in 1981 from a friend who had recently upgraded to a new machine. I was 16 years old then. It was like crack for me. I was instantly addicted to software development or at least the idea of it.

How did you get started in programming?

In the early days, commercial opportunities outside of the mainframe computing space were rare. At age 16, I didn't have access to a mainframe yet. Small businesses likely to use things I could invent used paper to track their accounting and they weren't likely to switch to using a computing system that didn't fully replace their accounts payable, accounts receivable, general ledger and payroll systems.

Although spreadsheets like VisiCalc and Lotus 1-2-3 arrived on the scene early, they were seen initially only as helper applications for doing what-if analysis and such. And they weren't highly programmable anyway. So real programming jobs related to the first "killer application" were hard to create. As a result, those of us working on smaller, home-based computers had no real market for our ideas. So we built games and so-called bulletin boards until our age arrived.

It wasn't until I bought an AT&T PC 6300 Plus with an 80286 CPU in 1986 that I found my first real commercial opportunity writing a turn-key accounting and inventory tracking system for a chain of video stores.

What was your first language?

Zilog-80 Assembler. I was a math geek so assembly language spoke to my soul.

What was the first real program you wrote?

I had just turned 17. My Mom was working for a local hospital organizing paper-based records. The hospital had a new numbering system for patient records for which they had to apply brightly-colored sticky labels to manila folders. I wrote a program for her that determined how many of each numeric digit was required to label a range of numbers. I started with a brute force approach which was very slow and then created an algorithm with a fixed number of computations to yield the same result. The hospital bought the program from me for $50 to help them manage the disbursement of labels to all of the contractors working on the project. It was written in Color BASIC for the Tandy Radio Shack Color Computer. I ported it to GW-BASIC to run on the hospital's IBM PCs.

What languages have you used since you started programming?

Various assembly language dialects: Z80A, MC6809, Intel 80x86, DEC Alpha and Motorola PowerPC. I wrote assembler on a Perkin-Elmer mainframe but I don't remember what the CPU architecture was, to be honest. I remember it was pretty weird, more like a macro compiler than an assembler. I got my first C language compiler for the Tandy Radio Shack Color Computer although I preferred using the assembler on that machine.

I was introduced to C With Classes (which later became C++) in 1984. There was no C++ compiler for the Perkin-Elmer mainframe I was working on. So we used something called Cfront to translate the C++ text into C language which was then further compiled into a set of linkable objects. I worked in C and C++ almost exclusively from 1988 until 2002 with brief stints into Java-land.

In 2002, the bright light of C# shone upon me and I've been pretty happy since then. C# is more expressive and makes me more productive than any other language I've ever worked with. I love Python and I'm itching to use it commercially. However, finding real applications that I can't implement with C# is pretty tough. I'm not sure PowerShell is a real programming language but, if it is, it's definitely one of my favorites.

Every time I've been forced to work with JavaScript (directly or indirectly), I've felt like I needed to take a shower. If JavaScript or HTML ever becomes the right way to implement anything, I'm switching professions. Silverlight has me very excited, as you can imagine.

What was your first professional programming gig?

In my senior year as an undergraduate student, I was contracted to write an application for tracking the inventory and sales for a small chain of video stores. I built the whole thing using Symantec's Q&A product. It was a way-cool, weird product that was part word processor, part database, part reporting engine. You could do amazing things in a short time with Q&A that would take days or weeks to implement using other tools. I think it was the first RAD tool there ever was in the PC space.

If you knew then what you know now, would you have started programming?

Yes. I enjoy software development now more than ever.

If there is one thing you learned along the way that you would tell new developers, what would it be?

Teach part time. Whether you do it at work, in the user group community or as an adjunct faculty member at a local college (I do all 3), you'll find that you learn more, faster by preparing to teach than you could ever learn as a student. Teaching is a fantastic form of mental catharsis. Teaching also helps build your public speaking skills, which translates into greater responsibility and a higher salary as you move forward in your career. Developers who can build cogent arguments and present them to executives often go further and faster in their careers.

What’s the most fun you’ve ever had … programming?

When I worked at Intel Corporation in the Architecture Labs, the supercomputing group parallelized one of my fractal algorithms and made it available as a 3D "game". I had written the Pascal language implementation of the Julia Set solver in college while I was working on a paper concerning the Riemann-Stieltjes Integral. The Julia Set is a particularly beautiful fractal based on the concept of a Riemann map. Working with the supercomputing group to parallelize my algorithm and see it work in real-time 3D was a heart-poundingly cool experience. You could visually "drive through" the set to infinite levels of detail, zooming, panning and scanning in real time. What took my PC days to compute, the supercomputer was performing in milliseconds, allowing you to move through the Julia Set fluidly as if flying in a flight simulator. It was a breathtaking experience for me.

Who are you calling out?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Fun | Software Development

Murray Gordon on the Spark That Brought in the Heavyweights

by kevin 6/26/2008 9:39:00 PM

I use NHibernate and LINQ to SQL on a site that supports millions of end users. Murray Gordon, an Architect Evangelist at Microsoft, has written a nice synopsis of the Entity Framework Vote of No Confidence debate so far. More than anything else I've read on the subject, this article brings together facts and opinions that seem to ring true from my deep experience.

I'd like to go on record and say that the way this is being handle by the MVPs who signed the petition is unprofessional, in my opinion. Microsoft MVPs are not required to tow the line and agree with everything that Microsoft says and does. But there are ways to communicate with Microsoft that are constructive and there are ways that would make any corporation, including Microsoft, digs its heels into the dirt. This petition used the latter strategy unfortunately. In particular, Microsoft MVPs have channels that the rest of us don't have. They should use them and not the blogosphere to make plain there grievances.

In some sense, I feel as though the signatories of the petition feel like they are playing Continental Congress against King George. But there's no Boston Tea Party here. Microsoft didn't raise any undue taxes from any of us. They simply put out a framework that's clearly a v1.0 product. Microsoft doesn't win with v1.0 products. It wins with version 4 products because as a corporation, it knows how to get the first down (an American football term meaning the team gets to stay on the offensive).

I say let Microsoft run the ball for a few plays and let's see how they do.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Architecture | Software Development | ORM

An Unfortunate Consequence of History

by kevin 6/25/2008 6:45:00 PM

Databases are just awful. I don't mean the products themselves but the concept of databases. Stop and think about how absurdly we behave when we write modern software. We generate scads of information in the course of our daily lives. (A scad for me is about 2Gb the last time I checked. May be more or less for you.) Much of this information approximates the lives we lead and the obligations we must honor. But rather than putting that information into a system that has the tools necessary to model the real world from which the data originally emanated, we usually choose to keep it in a place that does an efficient job of storage. When we need to put it back into the real life approximation engine, we shuttle the information in and out of our application servers as necessary. It's been estimated that as much as 50% of the time we spend in development is in bridging the gap between data storage and the business logic of our applications. That number may be an extreme, low or high. But even if this kind of work accounts for only 25% of our time, why would we choose to spend our development budget this way? Data is so simple. It should just be there, fully accessible to me all the time.

Some operating systems do a better job of closing the gap between code and data than others do. For example, the Pick System, originally developed by Dick Pick in the late 1960s uses a hash-based file system to create associative arrays that are super-efficient for many query operations. The only data type in the Pick System is the string. And most importantly, the Pick database engine is not relational. It is a multi-valued instead, meaning that any attribute that needs to have multiple values can just declare them. In the Pick mind, there's never a need to create related tables and join them for query or reporting. A platform that implements this type of database also typically ships with a Pick BASIC compiler which allows for direct manipulation of the query engine and the associative arrays it produces. The BASIC code runs right there in the database, not on a foreign system. Embedded Pick BASIC is not like the SQL CLR. The SQL CLR, for lack of a better term, is bolted onto the side of SQL Server. You can't do any real data manipulation in the SQL CLR. However, in Pick BASIC, you can freely manipulate schema and data directly. Forget for a moment that it's BASIC and you've got something great there. Compiled code running in the database that can manipulate database objects natively. Way cool and circa 1965.

IBM and InterSystems, among other vendors, still sell these databases like hotcakes today because they solve very real business problems for which relational databases are not ideally suited. First of all, they're fast. And I mean smokin' fast for many types of operations, especially high-volume transaction processing applications. This is partially due to the fact that because there are no join operations (in the classical sense), there's usually less work to do to obtain the data you're seeking. But even when there is a sub-select operation that is required to get what you're looking for, the efficiency of the underlying hash-based file system pays off handily. In database terms everything in the database is indexed, always.

My students and colleagues often hear me say that, "Databases are an unfortunate consequence of history." I say this (and believe it) because if you could travel back in time to 1948 and give the ENIAC developers at the University of Pittsburgh a handful of 4Gb DIMM chips and the necessary plans to connect them to their invention, relational databases like Oracle and Microsoft SQL Server would simply never have evolved. I think that the development path would have been more like what Dick Pick envisioned and built instead. Given enough memory early in computing history, associative arrays, set operations and in-memory manipulation of large data sets would have been the norm. However, as we know, memory was severely constrained in the early days of computing. In fact, it's only been in the last few years as new technology has allowed for memory prices to drop dramatically that it has been feasible to conceive of a solid-state database at all. Oracle's TimeTen and Microsoft's Project Code Name Velocity are leading-edge concepts in a new market-segment that will, one day, fully realize Dick Pick's vision, in my opinion. I predict that accessing data from distributed, in-memory databases will become the norm within my lifetime.

Many of the current Object/Relational Mapping (O/RM) debates are centered around my database evolution postulate because O/RM tools attempt the inverse of what the Pick OS does to achieve the same effect. O/RM tools essentially pull as many database semantics (sans execution) into the application tier as possible where the logic of the program is codified. Whether we run Pick BASIC in the database or use an O/RM to marshal data close to our C# code, the desired outcome is the same. But pulling data into an external execution engine as O/RM tools do is pretty close to nightmarish, to be frank. In fact, Ted Neward, whom I greatly respect, calls O/RM the Vietnam of Computer Science today, meaning a quagmire from which one cannot possibly be extricated and for which there is no good outcome. Ouch! What a stinging rebuke from a guy who's singularly qualified to make an assessment in this space. Even Ayende Rahien's blog post from earlier today reveals a sense of desparation about the state of O/RM technology. What a mess we've gotten ourselves into! No O/RM suite that I know of addresses the real problem at hand, i.e. making data access so transparent that you don't even know you're doing it.

We use both NHibernate and Language Integrated Query (LINQ) to SQL at SnagAJob.com for O/RM. They make life easier in some ways but so much more difficult in others. I cannot begin to count up the hours we've spent tuning the session management code in NHibernate to deal with authentication and transaction management issues. And you don't burn up welterweight programmer resources on that kind of work. Your heavy hitters need to be deeply involved because there are architectural design issues at every turn. Every minute that your senior developers and architects are distracted with this kind of stuff, they aren't focusing on what you thought you hired them for. LINQ is better than HNibernate in a couple of ways, chiefly because of the expressiveness afforeded by the IEnumerable<T> extension methods and the query comprehension syntax. But deploying LINQ to SQL or LINQ to Entities in a real-world environment is still not as simple as it should be. And the real goal of transparent data access is still far, far way using NHibernate or LINQ.

If you know of an O/RM suite that makes accessing SQL data more Pick-like as I've described, i.e. more transparent, I'd like to hear about it.

<Interesting Related Story> In 1993 while working for Datastorm Technologies, Inc., I attended Comdex in Las Vegas. At lunch one day, two fellows joined me at the table. The older fellow to my right introduced himself as Dick Pick. I asked him what he did for a living and he graciously and eagerly explained the Pick OS, it's simple power and beauty and a smallish version of his life story. I was impressed but didn't really get it at the time partly because the fellow seated across the table introduced himself as Phil Katz, the inventor of the PKZip file compression utility. For me, Phil Katz's fame overshadowed Dick Pick's because I didn't know any better. So, I didn't engage with Dick in conversation to the degree that I really should have. History, it seems, hasn't been all that respectful to Dick Pick either. Phil Katz has a detailed Wikipedia article about him yet Dick Pick doesn't, for example. Googling for Dick Pick yields scads (there's that word again) of Dick's Pick's Grateful Dead references and nearly nil related to the computer science genius of our time. In retrospect, even being seated with a legend like Dick Pick was a real honor. I wish I had known to take advantage of the opportunity that was given to me. Live and learn. </Interesting Related Story>

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Architecture | LInQ | Software Development | SQL Server | SQL Server 2008

Justin Etheredge on Functional Programming

by kevin 6/3/2008 10:07:00 AM

For those of you in central Virginia, my friend Justin Etheredge is speaking to the Richmond .NET User Group this Thursday evening (6/5) at 6:30 p.m. EDT in the Markel Building’s first floor salon at 4600 Cox Road regarding:

Functional Programming Features of C# 3.0
 

Justin’s one of the brightest, hardest-working people I know. Check out his blog. Come out and meet Justin and network with some other software developers in the community. You won’t be disappointed.

Oh, and did I mention that the victuals will be a la Maggiano’s Little Italy? Mmmmm……

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

C# | Richmond | Software Development | User Group

StyleCop MSBuild Integration

by kevin 5/24/2008 8:10:00 PM
The Source Code Analysis team at Microsoft has a new blog. An article was published today by Jason Allor describing how to do MSBuild integration. I was going to write an article on this subject but why reinvent the wheel when Jason did such a good job?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

C# | Software Development | MSBuild

StyleCop - Microsoft Source Code Analysis for CSharp

by kevin 5/23/2008 11:07:00 PM

Microsoft released their source code analysis tool for C# today. It's called StyleCop, sort of like FxCop except for source code. This tool is available for download here. I've been getting some coding standards in place at SnagAJob.com and this tool comes at the perfect time. StyleCop works with both Visual Studio 2005 and 2008. After you install it, right-clicking on a file in the Solution Explorer or in the code editor will present a new option on the context menu entitled Run Source Analysis like this:

 

Clicking on the new option will run the source code analysis tool on the selected file(s) and produce a report in a new source analysis window that looks like this:

Notice that each StyleCop violation shown has a code beginning with the letters SA and followed by a numeric identifier. These codes defined by StyleCop rules are grouped in the following way:

  • Spacing rules - 1000 series
  • Readability rules - 1100 series 
  • Ordering rules - 1200 series 
  • Naming rules - 1300 series
  • Maintainability rules - 1400 series 
  • Layout rules - 1500 series 
  • Documentation rules - 1600 series

In the Source Analysis output window above, notice the error SA1027. That rule is not valid for me because, for my projects, I want to promote the use of tabs in source code whenever possible. So how can I turn this source code analysis error off? Well, in the installation folder for StyleCop, there is an XML file called Settings.SourceAnalysis. On my Longhorn installation, this can be found in D:\Program Files\Microsoft Source Analysis Tool for C#. On your system, that folder may be elsewhere. There is also an executable program named SourceAnalysisSettingsEditor.exe in the installation folder. If you run the settings editor, passing the name of the settings file as the first parameter, you'll see something like this:

Notice the seven items under C# in the tree on the left? Those are the seven rule groups I mentioned earlier. Since it's the rule for SA1027 that I want to disable, I simply open up the Spacing Rules node and locate that rule by its code. I need to uncheck the checkbox for that rule like this:

Clicking on the Apply button, the change is made to the XML settings file. Opening the XML file in Notepad, we can see the change documented this way:

From the XML, you can see that the Boolean property called Enabled has been set to False for the rule named TabsMustNotBeUsed within the SpacingRules. And running the source code analysis on my file(s) from Visual Studio again shows that the SA1027 error related to the use of tabs in the source code has been eliminated from the output. Nice. You can tune the master settings file for all the other rules to suit your needs in the same manner. I'll be blogging more about MSBuild integration and other cool stuff that StyleCop can do in the near future.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , , ,

C# | Software Development | StyleCop

Partial Methods in C# 3.0

by kevin 5/16/2008 3:00:00 AM

I've been using the LINQ to Entities and LINQ to SQL modeling tools a lot recently. They are pretty good. Having experienced the pain of disconnectedness in using myGeneration and NHibernate in my company's re-architecture effort over the last year, it feels good to have some O/RM support in the Visual Studio IDE and in the C# language. I can say with certainty that if LINQ had been ready in April of 2007, when I had to make the choice, I would never have chosen to use NHibernate.

It's not that NHibernate is bad. Quite the contrary. It's both rich and expressive. But the level of effort that's required to get an effective NHibernate implementation up and running is overwhelming for most companies. True expertise must be developed to use NHibernate effectively. I've experienced that phenomenon first hand. It can take literally weeks or months if you have a complex schema with which you must integrate. And you cannot overlook the fact that while you may be able to afford to train one or two of your developers to the required level of expertise in using NHibernate, the real value of any software system comes from getting most of your staff to participate in the development process. Otherwise, you will create a bottleneck for new development, maintenance and problem triage.

LINQ, on the other hand, can be learned pretty well by intermediate-skilled developers in a few days, in perhaps a week at most, against a comparably complex schema. And with the help of the Query Comprehension Syntax, most developers who are comfortable writing SQL queries usually feel right at home with LINQ in no time at all. Granted, to achieve true expertise with LINQ also takes weeks or months. However, in my experience, to be functional with LINQ, expertise is not required. That is The Microsoft Way, isn't it? That's how Microsoft wins.

OK, let's talk about partial methods and how the LINQ data modeling tools use them. Looking at the code generated for the Northwind database by the LINQ to Entities modeling tool, for example, you'll see a C# class representing the Categories table that looks something like this:

public partial class Categories : EntityObject
{
    private string _CategoryName;
    partial void OnCategoryNameChanging(string value);
    partial void OnCategoryNameChanged();

    public string CategoryName
    {
        get { return this._CategoryName; }
        set
        {
            this.OnCategoryNameChanging(value);
            this._CategoryName = StructuralObject.SetValidValue(value, false, 15);
            this.OnCategoryNameChanged();
        }
    }
}

The first time I looked at this kind of code a few months ago, I cocked my head like the puppy on the old RCA label staring into the phonograph as if to say, "What is this?" What is this odd use of the partial keyword on a method declaration? I had been tracking changes in the C# 3.0 specification but is it possible that I missed this new feature? Indeed, it seems so. Well, there's no time to learn like the present. So how do partial methods work?

A partial method like OnCategoryNameChanging and OnCategoryNameChanged shown above must be defined within a partial class. We know how partial classes work. Multiple parts of a larger class, typically defined in separate files, are assembled by the compiler into a single, cohesive class. So, does a partial method behave the same way? Can I define a single method in multiple parts and have the compiler coalesce the parts together? No, that's not what the term partial means in this case. How would the compiler know the order in which to apply the method parts, for example? With partial classes, ordering of discrete (and complete) member methods into the whole is not an issue. However, if methods were allowed to be defined in part, ordering would be of penultimate importance.

So, I suppose it's somewhat unfortunate for us that the keyword partial is used in this context. A partial method can't really be defined in parts. In deciding whether to reuse an existing keyword in this case or to create a new one, the language designers opted for the former. There is precedent for a decision like that. The abstract keyword may be used in C# to mark a class or a method, too. In Visual Basic .NET, the abstract concept applied to a class uses the keyword MustInherit, which is actually quite intuitive. The concept of an abstract method in Visual Basic .NET uses the keyword MustOverride, also self-documenting. But, in C# we just reuse the keyword abstract for both of these cases. The baggage of a bygone era, I suppose.

Using the abstract keyword precedent for illustration here has even more value. Looking at the definition of the OnCategoryNameChanging and OnCategoryNameChanged methods in the Categories class above, they appear to be very much like abstract methods. The partial methods have no implementation, i.e. no curly braces and no actual instruction code within. So how do they really work? Let's example another part of the Categories class in the same assembly, a part not written by the code generation tool:

public partial class Categories
{
    partial void OnCategoryNameChanging(string value)
    {
        // examine the value and throw an exception
        // to stop the change if necessary
        // because of the way the CategoryName property's set
        // accessor is written, this is like an AOP before advisor
    }

    partial void OnCategoryNameChanged()
    {
        // do whatever makes sense with the changed name
        // because of the way the CategoryName property's set
        // accessor is written, this is like an AOP after advisor
    }
}

This part of the Categories class would ostensibly be written by a human being interested in intercepting the before and after events associated with a changing category name attribute. What the compiler does with this is really quite extraordinary. If the OnCategoryNameChanging and OnCategoryNameChanged methods had been declared as abstract, the Categories class would have also been abstract by definition. Going back the Visual Basic .NET keywords, a method that one MustOverride would necessarily be part of a class that one MustInherit. That makes sense, right?

But partial methods are not abstract, even though they appear to be quite similarly formed. In the case of an abstract method, the programmer is essentially saying that a derivation must exist that gives the method its shape. What a partial method implies instead is that the shape of a method may or may not exist at the programmer's discretion. If the programmer chooses to provide an implementation, then it can be called. If not, that's OK, too. This is really useful in the case of LINQ because we can establish an AOP-like advisory pattern around changing attributes (columns) very simply. If the programmer needs the before and/or after advice for some value, she can implement the partial methods to get it.

This led me to ponder, "What does the compiler emit to IL if one or both of the partial method advisors aren't implemented?" The answer is nothing. If you don't implement a partial method, the compiler simply emits no code wherever it is invoked. For unimplemented partial methods, it's as if the declaration never existed. Very efficient, indeed. So, partial methods give us some of the power of abstraction without forcing us to derive a whole new type to get it. Of course, there are limitations. Because partial class parts are limited to a single assembly, partial method implementations are also limited to a single assembly. Abstract classes don't have such a restriction.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

C# | Software Development

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award Join me at CodeStock

Calendar

<<  July 2008  >>
MoTuWeThFrSaSu
30123456
78910111213
14151617181920
21222324252627
28293031123
45678910

View posts in large calendar

Recent posts

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in