Dictionary Classes Benchmarked

by kevin 7/13/2008 8:30:00 PM

A few days ago, I stumbled on an article by Amit Raz about the SortedList<K,T> on Dev102.com. In the article, which compares and contrasts the SortedList collection class in the .NET BCL to the SortedDictionary class, Amit concludes with, "So what is the SortedList good for? Beats me. I deem it useless." His conclusion seemed to be predicated on the fact that the Add() method would throw an exception if the programmer attempted to insert a duplicately keyed entry into a SortedList. However, this is documented behavior. And both the SortedList and the SortedDictionary exhibit that same behavior. An indexer exists on each of those collections that will allow the insertion of a duplicately keyed item. For both classes, duplicates replace the original values when using the indexer.

I had a difficult time following Amit's logic but he's a bright fellow so I wanted to find out if the SortedList really was useless as he had proclaimed. On this page, Microsoft provides a table that describes the benefits and relative drawbacks of these two seemingly similar classes. Long story short, the key advantages of the SortedList<K,T> are that it uses less memory than the SortedDictionary<K,T> and that some of its members that return keys and values are indexed. OK, so the SortedList uses less memory. But what's that latter claim about?

Well, the SortedList<K,T> has a few members that the SortedDictionary<K,T> does not. Among them are:

  • int IndexOfKey( K key )
  • int IndexOfValue( T value )

Kicking around in this class in Reflector, one sees that the internal storage for the SortedList<K,T> is a pair of arrays: one for the keys which is kept in sorted order and one for the values which is kept in insertion order. When using the IndexOfKey method noted above, Array.BinarySearch() is used to perform an efficient search for the desired key. However, the IndexOfValue method uses a brute force (O(n)) scan to find the requested value. Another special case advantage of the SortedList is that insertions are O(1) for data inserted in sorted order whereas for the SortedDictionary, the average insertion cost is around O(log n). In fact, for data already in sorted order, the SortedDictionary pays a bit of a penalty on insertion because of the required balancing of the tree structure used to store the information. More on that later. So, if you have small lists and the keys are already sorted before insertion, the SortedList might be a good choice. If your keys are not sorted, insertion into a SortedList could be as bad as O(n). So it may be that you have to know something about your data to use the SortedList in a way that makes it worthwhile. Not understanding your data, could make the SortedList worthless, as Amit claims.

I wrote a small test harness that exercises the SortedList<K,T> and SortedDictionary<K,T>. I've included a link to the source code below. The application runs a battery of tests using a small list of 1,000 items on each type of dictionary and the same battery using a large list of 40,000 items. The main window looks like this:

These results show an average test run on my Windows Server 2008 machine using the .NET Framework 3.5. For the small list of 1,000 items, the SortedList seems to be a bit more efficient than the SortedDictionary with respect to time, despite the fact that the keys are not in sorted order for that test. However, when it comes to the larger list of 40,000 items, the SortedDictionary is the clear winner for both insertions and removals. But what about memory? Remember that Microsoft's MSDN topic said that the SortedList can be more memory efficient? I put 10 seconds of time in between the 4 test groups shown on the screen shot above and ran the .NET memory profiler to see what was happening. It's not all that conclusive, in my opinion. Here's a graphic showing the Gen 0, 1 and 2 heap sizes over the lifetime of the test. Perhaps you can help me analyze what you see:

There is a very large spike during the third test in the Gen 1 heap size when the large data set of 40,000 values in placed into the SortedList. Most of that newly allocated memory seems to move to the Gen 2 heap when the last test kicks off, returning the Gen 1 heap almost back to it's former value. The internal implementation of the SortedDictionary is based on a private, internal class in System.Collections.Generic called TreeSet<T>. In an upcoming blog post, I will be examining the TreeSet<T> class in detail. It uses a special kind of binary tree implementation known as a red-black tree. Why Microsoft didn't expose this incredibly cool class, I don't understand. So I suppose I should do that, right? Until next time...

Source Code for the DictionaryTestHarness Application (6KB)

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

BCL | C# | Debugging | Software Development

Will Silverlight Topple ClickOnce?

by kevin 4/3/2008 10:21:00 PM

I attended the Richmond .NET User Group meeting this evening. Craig Adams of Vivus Software presented concerning ClickOnce application deployment issues from the real world. It was a good presentation. Although I know quite a bit about ClickOnce deployment, I learned a few things at the meeting. As I sat there, listening to Craig, I couldn't help but think about how Silverlight 2 packages its XAP files to include an application manifest. Certainly, the CoreCLR and BCL in Silverlight aren't as full featured as what the full CLR and FCL provide. And the range of deployment options in ClickOnce is good for system administrators and who need finer control over the process.

But I found myself thinking about how simple Silverlight 2 deloyment is. And now that we have the Silverlight Controls in Beta 1, how long will it be before we start seeing mass migration to Silverlight for doing what we once would have used ClickOnce deployment? At SnagAJob.com, we have one tool that we developed as a ClickOnce application because it was just too hard to build as a web application. So, we have this fantastic web application for our employer community with a hole where the ClickOnce application would go if we could make it run on the web without a lot of effort. So, we are actively porting it to Silverlight right now and I suppose we will retire the ClickOnce application as soon as it passes QA.

Personally, I'm excited thinking about a web-centric world where the browser can easily host rich, interactive forms that run my C# code on the desktop. I understand the glitz and glamour of Silverlight as a media-savvy presentation engine. But if I can get it to do Windows Forms cleanly and easily, I'll be very happy. Let's face it. Web development is just terrible today because HTML and JavaScript are awful. I tell people all the time that if I had to write HTML and JavaScript every day, I would find another lineof work. Using C# and XAML on the desktop through the browser is the best idea in computing in the last decade, in my opinion.

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

BCL | C# | Richmond | Silverlight | Software Development | User Group

The System.DateTimeOffset Type

by kevin 2/21/2008 10:29:00 PM
I've just been experimenting with the new DateTimeOffset type in .NET 3.5. It's about time. I've been evangelizing for the use of UTC for storage of times since it was called GMT. And it seems that SQL 2008 has a new DATETIMEOFFSET type, too. Now, instead of storing UTC offset as a separate attribute in every table that contains a DATETIME value, I can store the original date and it's original offset at the time of storage in one column. That will be nice. It remains to be seen how this will affect my T-SQL practices and the various ORM technologies I support. It seems that LINQ to SQL will be getting support for DateTimeOffset and related classes, too. I am somewhat disappointed that the new DATE and TIME types from SQL 2008 aren't available natively in .NET. There are good arguments for being able to store date and time references separately. Anyway, one step at a time, right?

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: , ,

LInQ | SQL Server 2008 | CTS | BCL

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award When a problem comes along, you must flip it!

Calendar

<<  August 2008  >>
MoTuWeThFrSaSu
28293031123
45678910
11121314151617
18192021222324
25262728293031
1234567

View posts in large calendar

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in