Dictionary Classes Benchmarked

by kevin 7/13/2008 8:30:00 PM

A few days ago, I stumbled on an article by Amit Raz about the SortedList<K,T> on Dev102.com. In the article, which compares and contrasts the SortedList collection class in the .NET BCL to the SortedDictionary class, Amit concludes with, "So what is the SortedList good for? Beats me. I deem it useless." His conclusion seemed to be predicated on the fact that the Add() method would throw an exception if the programmer attempted to insert a duplicately keyed entry into a SortedList. However, this is documented behavior. And both the SortedList and the SortedDictionary exhibit that same behavior. An indexer exists on each of those collections that will allow the insertion of a duplicately keyed item. For both classes, duplicates replace the original values when using the indexer.

I had a difficult time following Amit's logic but he's a bright fellow so I wanted to find out if the SortedList really was useless as he had proclaimed. On this page, Microsoft provides a table that describes the benefits and relative drawbacks of these two seemingly similar classes. Long story short, the key advantages of the SortedList<K,T> are that it uses less memory than the SortedDictionary<K,T> and that some of its members that return keys and values are indexed. OK, so the SortedList uses less memory. But what's that latter claim about?

Well, the SortedList<K,T> has a few members that the SortedDictionary<K,T> does not. Among them are:

  • int IndexOfKey( K key )
  • int IndexOfValue( T value )

Kicking around in this class in Reflector, one sees that the internal storage for the SortedList<K,T> is a pair of arrays: one for the keys which is kept in sorted order and one for the values which is kept in insertion order. When using the IndexOfKey method noted above, Array.BinarySearch() is used to perform an efficient search for the desired key. However, the IndexOfValue method uses a brute force (O(n)) scan to find the requested value. Another special case advantage of the SortedList is that insertions are O(1) for data inserted in sorted order whereas for the SortedDictionary, the average insertion cost is around O(log n). In fact, for data already in sorted order, the SortedDictionary pays a bit of a penalty on insertion because of the required balancing of the tree structure used to store the information. More on that later. So, if you have small lists and the keys are already sorted before insertion, the SortedList might be a good choice. If your keys are not sorted, insertion into a SortedList could be as bad as O(n). So it may be that you have to know something about your data to use the SortedList in a way that makes it worthwhile. Not understanding your data, could make the SortedList worthless, as Amit claims.

I wrote a small test harness that exercises the SortedList<K,T> and SortedDictionary<K,T>. I've included a link to the source code below. The application runs a battery of tests using a small list of 1,000 items on each type of dictionary and the same battery using a large list of 40,000 items. The main window looks like this:

These results show an average test run on my Windows Server 2008 machine using the .NET Framework 3.5. For the small list of 1,000 items, the SortedList seems to be a bit more efficient than the SortedDictionary with respect to time, despite the fact that the keys are not in sorted order for that test. However, when it comes to the larger list of 40,000 items, the SortedDictionary is the clear winner for both insertions and removals. But what about memory? Remember that Microsoft's MSDN topic said that the SortedList can be more memory efficient? I put 10 seconds of time in between the 4 test groups shown on the screen shot above and ran the .NET memory profiler to see what was happening. It's not all that conclusive, in my opinion. Here's a graphic showing the Gen 0, 1 and 2 heap sizes over the lifetime of the test. Perhaps you can help me analyze what you see:

There is a very large spike during the third test in the Gen 1 heap size when the large data set of 40,000 values in placed into the SortedList. Most of that newly allocated memory seems to move to the Gen 2 heap when the last test kicks off, returning the Gen 1 heap almost back to it's former value. The internal implementation of the SortedDictionary is based on a private, internal class in System.Collections.Generic called TreeSet<T>. In an upcoming blog post, I will be examining the TreeSet<T> class in detail. It uses a special kind of binary tree implementation known as a red-black tree. Why Microsoft didn't expose this incredibly cool class, I don't understand. So I suppose I should do that, right? Until next time...

Source Code for the DictionaryTestHarness Application (6KB)

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags: ,

BCL | C# | Debugging | Software Development

Related posts

Comments

7/14/2008 8:09:10 AM

Justin Etheredge

Nice post, I am going to have to check out the TreeSet<T>, I had no idea that was in there.

Justin Etheredge us

7/16/2008 10:35:59 AM

W. Kevin Hazzard

I am preparing a blog post about TreeSet<T> now. I have reverse engineered and extracted the class in it's entirety and made it available in my gotnet library.

Kevin

W. Kevin Hazzard us

7/17/2008 9:07:40 AM

pingback

Pingback from professionalaspnet.com

Chris Love's Official Blog - Professional ASP.NET : Links of the Week July 16th 2008

professionalaspnet.com

7/19/2008 10:23:41 PM

michael herndon

since the introduction of linq is almost complete (as soon as the SP1 comes out with the official linq to entities) how much more efficient would it be to do a linq to object query over a generic SortedList and generic sorted dictionary?

michael herndon us

7/20/2008 11:12:22 AM

W. Kevin Hazzard

@michael herndon: LINQ is all about enumeration or sequential access to multiple records. Only one of the tests in my suite here uses enumeration. The others take advantage of the mapping function that a dictionary provides. The SortedList simply walks the sorted List<K> that it maintains for the key values whereas the SortedDictionary does a right-first walk of the balanced red-black tree called TreeSet<T> to do enumeration. LINQ would simply take advantage of those built-in enumerators so it wouldn't be any more or less efficient.

W. Kevin Hazzard us

Add comment


(Will show your Gravatar icon)  

  Country flag

[b][/b] - [i][/i] - [u][/u]- [quote][/quote]



Live preview

8/28/2008 1:55:04 AM

Powered by BlogEngine.NET 1.3.1.0
Theme by Mads Kristensen


Kevin's on Twitter / FriendFeed

W. Kevin Hazzard Welcome to Kevin Hazzard's Blog. Kevin is a Software Architect, Professor and Microsoft MVP specializing in C#, WCF, Silverlight and IronPython.

View Kevin Hazzard's profile on LinkedIn
Microsoft MVP Award When a problem comes along, you must flip it!

Calendar

<<  August 2008  >>
MoTuWeThFrSaSu
28293031123
45678910
11121314151617
18192021222324
25262728293031
1234567

View posts in large calendar

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2008

Sign in