got net?

Kevin Hazzard's Brain Spigot

About the author

Welcome to Kevin Hazzard's blog.
E-mail me Send mail

Recent posts

Recent comments

Authors

Disclaimer

The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.

© Copyright 2010

JavaScript Object Notation (JSON) in .NET Part 1

Welcome to Part 1 of a multi-part series on JavaScript Object Notation (JSON) support in the Microsoft .NET Framework. In this article, we'll focus on WCF's DataContractJsonSerializer. You can find the other parts of this series at these locations:

  • Part 1 - An exploration of the DataContractJsonSerializer class
  • Part 2 - An exploration of the JavaScriptSerializer class
  • Part 3 - JSON serialization from WCF (including a REST primer)
  • Part 4 - Using the JavaScriptConverter class to customize JSON serialization

This is the beginning of a series in which I'll explore support for JavaScript Object Notation (JSON) in the Microsoft .NET Framework. In this part, I'll be using the DataContractJsonSerializer. You can use the DataContractJsonSerializer to stream any serializable POCO into its JSON representation. Of course, this is handy for working with AJAX controls but you aren't limited to using JSON in a web context at all. Look at the following simple class definition in C#:

[Serializable]
public class Thingy
{
    public string Name { get; set; }
    public int ID { get; set; }
    public int Age { get; set; }
}

Classes that will be serialized or deserialized with the DataContractJsonSerializer in .NET must be marked as [Serializable] or as a [DataContract]. Code that creates an instance of this class and serializes it to JSON using the WriteObject method of the DataContractJsonSerializer might look like this:

private static void Main()
{
    // create a Thingy and dehydrate it into JSON
    var ser = new DataContractJsonSerializer( typeof( Thingy ) );
    var t1 = new Thingy
                 {
                     ID = 1,
                     Name = "Kevin",
                     Age = 44
                 };
    var mstrm = new MemoryStream();
    ser.WriteObject( mstrm, t1 );
    var jsonText = Encoding.UTF8.GetString( mstrm.GetBuffer() );
    Console.WriteLine( jsonText );
}

would yield the following text on the console window:

{
   "<Age>k__BackingField":44,
   "<ID>k__BackingField":1,
   "<Name>k__BackingField":"Kevin"
}

Simple enough. Notice that JSON doesn't qualify any of the data by type. You can see that the names of the automatic properties are odd-looking, too. This is because the C# compiler names the backing fields for automatic properties in a way that they are guaranteed not to clash with the names you can create but it also makes them ugly and hard to read. Let's mark the Thingy class as a DataContract instead to see what effect that has on the output of our little program.

[DataContract]
public class Thingy
{
    [DataMember]
    public string Name { get; set; }
    [DataMember]
    public int ID { get; set; }
    [DataMember]
    public int Age { get; set; }
}

If you run the small program above against the DataContract-oriented Thingy class, you now get this much cleaner-looking output:

{
   "Age":44,
   "ID":1,
   "Name":"Kevin"
}

That's much better, don't you think? It's also smaller which will make tranmission time faster, too. OK, now let's focus on deserialization. Of course, you could deserialize the MemoryStream containing the JSON text into a new Thingy instance using the ReadObject method of the DataContractJsonSerializer. But that's no fun at all. We should do something a bit more interesting with our time. Since JSON doesn't qualify the types of the properties, we should be able to deserialize the JSON text into a different class that has properties with compatible names. But does order matter? And does it matter if the JSON text contains data for properties that don't exist in the target type? Let's test those ideas. Look at the following class definition for an OtherThingy class:

[DataContract]
public class OtherThingy
{
    [DataMember]
    public float Age { get; set; }
    [DataMember]
    public string Name { get; set; }
}

The OtherThingy class has an Age property but it's a floating point type, unlike the integer Age property in the Thingy class. Also notice that the Age property of the OtherThingy class is specified before the Name property reading top to bottom. Finally, note that the OtherThingy class contains no ID property at all. Will the serialized Thingy JSON text deserialize correctly into a new OtherThingy instance? Here's a program that tests these ideas:

using System;
using System.Collections.Generic;
using System.IO;
using System.Runtime.Serialization.Json;
using System.Runtime.Serialization;

// to run this code, be sure to reference
// the following assemblies:
// A. System
// B. System.Core
// C. System.Runtime.Serialization
// D. System.ServiceModel.Web
// E. System.XML

[DataContract]
public class Thingy
{
    // we'll dehydrate an instance of this class
    // through the JSON data contract serializer
    [DataMember]
    public string Name { get; set; }
    [DataMember]
    public int ID { get; set; }
    [DataMember]
    public int Age { get; set; }
}

[DataContract]
public class OtherThingy
{
    // then we'll rehydrate into an instance of
    // this class - notice the differences:
    // 1. the Age properties types don't match
    // 2. the ID property is missing here
    // but that's OK - it will still work!
    [DataMember]
    public float Age { get; set; }
    [DataMember]
    public string Name { get; set; }
}

internal static class Program
{
    private static void Main()
    {
        // create a Thingy and dehydrate it into JSON
        var ser = new DataContractJsonSerializer(
            typeof( Thingy ) );
        var t1 = new Thingy
                 {
                     ID = 1,
                     Name = "Kevin",
                     Age = 44
                 };
        var mstrm = new MemoryStream();
        ser.WriteObject( mstrm, t1 );

        // dump Thingy object details to console
        Console.WriteLine( "Serializing {0} from:",
            t1.GetType().Name );
        t1.ObjectProperties().ForEach(
            kvp => Console.WriteLine( "({2}) {0} = {1}",
                       kvp.Key, kvp.Value,
                       kvp.Value.GetType().Name ) );

        // rewind the stream so it can be read
        mstrm.Position = 0;

        // rehydrate an OtherThingy from JSON
        ser = new DataContractJsonSerializer(
            typeof( OtherThingy ) );
        var t2 = ser.ReadObject( mstrm );

        // dump OtherThingy object details
        Console.WriteLine( "Deserialized {0} as:",
            t2.GetType().Name );
        t2.ObjectProperties().ForEach(
            kvp => Console.WriteLine( "({2}) {0} = {1}",
                       kvp.Key, kvp.Value,
                       kvp.Value.GetType().Name ) );

        // wait for user input
        Console.WriteLine( "Done. Press Enter . . ." );
        Console.ReadLine();
    }

    // an extension method to stream object properties
    // into KeyValuePairs - wish I had a Tuple type :(
    private static IEnumerable<KeyValuePair<string, object>>
        ObjectProperties( this object obj )
    {
        foreach (var pi in obj.GetType().GetProperties())
        {
            yield return new KeyValuePair<string, object>(
                pi.Name, pi.GetValue( obj, null ) );
        }
    }

    // an extension method for the missing ForEach
    // of IEnumerable<T>
    private static void ForEach<T>(
        this IEnumerable<T> source, Action<T> action )
    {
        foreach (T item in source)
            action( item );
    }
}

The output shows:

Serializing Thingy from:
(String) Name = Kevin
(Int32) ID = 1
(Int32) Age = 44

Deserialized OtherThingy as:
(Single) Age = 44
(String) Name = Kevin

Coolness! It behaved as I had hoped. We'll have more fun with JSON serialization in the next installment.


Categories: Architecture | C# | CapTech
Posted by kevin on Thursday, April 30, 2009 8:48 AM
Permalink | Comments (0) | Post RSSRSS comment feed

English Words Database from 11 Sources

I am working on a project where I needed a list of English words in a Microsoft SQL Server database. I found some public domain lists of English words at:

ftp://ftp.ox.ac.uk/pub/wordlists/dictionaries

There are 11 interesting word lists here including:

  • Unabridged
  • CRL
  • Roget
  • Unix
  • Antworth
  • Knuth
  • KnuthBritish
  • Englex
  • Shakespeare
  • Pocket
  • UU.net

Most of these lists haven't been updated since the mid-1990s so if you find a more updated (free) source of English words, please let me know. I loaded all the data into a table that has these attributes:

  • [WordGuid] [uniqueidentifier] NOT NULL
  • [WordText] [nvarchar](30) NOT NULL
  • [WordLength] [tinyint] NOT NULL
  • [SoundexGroup] [nchar](1) NOT NULL
  • [SoundexValue] [smallint] NOT NULL
  • [GroupId] [smallint] NULL
  • [IsPalindrome] [bit] NOT NULL
  • [InUnabr] [bit] NOT NULL
  • [InAntworth] [bit] NOT NULL
  • [InCRL] [bit] NOT NULL
  • [InRoget] [bit] NOT NULL
  • [InUnix] [bit] NOT NULL
  • [InKnuthBritish] [bit] NOT NULL
  • [InKnuth] [bit] NOT NULL
  • [InEnglex] [bit] NOT NULL
  • [InShakespeare] [bit] NOT
  • [InPocket] [bit] NOT NULL
  • [InUUNet] [bit] NOT NULL

The [WordGuid] is actually the MD5 hash of the [WordText] expressed as a UNIQUEIDENTIFIER so it makes a nice universal primary key. I've precomputed the [WordLength], [IsPalidrome] and a couple of Soundex values to make querying the table a bit more efficient. I've also computed a [GroupId] for each word. Every word that shares a [GroupId] is composed of exactly the same letters in various orders. You could find all the whole word anagrams for a given word using the [GroupId] for example. Finally, I've created a handful of [In*] flags to tell me which word file(s) each word was sourced from. I've made the database available in two forms below:

Attachable (as MDF/LDF) Microsoft SQL Server 2005 Database (21.20 MB)

Tab-delimited CSV File with Table Creation Script (11.10 MB)

Please see the licenses in the files at the source web site listed at the top of this post. All of the licenses are academic and free for use but your company may want to read and catalog them for full compliance.

Enjoy!


Categories: Fun
Posted by kevin on Saturday, April 04, 2009 8:47 PM
Permalink | Comments (2) | Post RSSRSS comment feed

Language Features Versus Tool Features

F4 Phantom

The physics guys in my college fraternity often joked that the F4 Phantom aircraft was proof that, with enough horsepower, even a brick could fly. In the software development world, we Microsoft developers have a similar joke:

"With enough Eclipse plug-ins, even Java becomes a useful programming language."

OK, I know I'll take a lot of heat for that. I certainly appreciate that the emergence of Java in the mid-1990s rescued me from the minutia and doldrums of C++. But once you become exposed to the expressiveness of the C# language and the richness of the .NET Framework Class Library, you just cannot to go back. I've worked in both .NET and Java environments extensively and I get more and better work done in C#. End of that story. And I apologize to my Java-oriented friends for the bad joke.

But there's an interesting phenomenon happening now that the joke sort of highlights. C# is mature and stable. It's about ten years old at the time of this writing. C# has grown considerably over that decade, acquiring lots of really cool new features:

  • Generics Classes and Methods
  • Anonymous Methods with Captured Outer Variables (Closures)
  • Nullable Value Types
  • Streaming Iterators
  • Partial Types
  • Static Classes
  • Partial Classes
  • Implicit Typing of Local Variables
  • Object Initializers
  • Collection Initializers
  • Anonyous Types
  • Lambda Expressions
  • Expression Trees
  • Extension Methods
  • Automatic Properties
  • Query Comprehension Syntax

And this list doesn't begin to describe all the changes to the CLR and FCL to support these great language features. Many of my friends who code in Java every day are openly unhappy about how slowly Java has been evolving as a language during this period. While Sun seems happy to add wonderful new features to the JVM and the class libraries all the time, they seem somewhat reluctant to expose much of that thinking through the language itself. Now that Java has gone Open Source (in a sense), it will be interesting to see how this changes. Certainly ECMA's oversight of the C# specification has been a good thing since the language's inception.

I think that programming languages have inertia. In the physics world, we say that Force equals Mass times Acceleration or F = ma, as Newton's Second Law of Motion describes it. As programming languages go, C# certainly has tremendous force in the marketplace based on the sheer mass of the features being added to the platform and the language over this past decade. And the acceleration that's been achieved through arguably modest adoption of the .NET platform finishes the equation. Java, which has grown much more conservatively as a language, also has great force. Look at the trends from Indeed.com for Java and C# job postings.

These graph lines were harvested from job descriptions across the Internet containing the terms Java and C#. You can see that C#has enjoyed steady growth over the last few years. The gap between job postings mentioning C# and Java even seems to be closing until Q3 of 2008. Even since then, C# has enjoyed statistically steady growth. But Java took a statistically unusual leap at the same time. Was it the release of Eclipse 3.4 (Ganymede) in June 2008 that is reflected in this graph? It would be interesting to hear your feedback on that.

It's clear that the ecosystem that exists to support the Java language is much richer than that which exists in the .NET space. Visual Studio, as a development environment, is highly integrated.  I can load half a dozen projects in the debugger at once and step from client to service to database code all in one smooth motion. Microsoft has done a marvelous job with Visual Studio, no doubt. But it is a largely closed platform. The developers at JetBrains have said very publically that each new compiler and IDE release forces them to do rewrites of large portions of their code to support the new language features and libraries that appear.

The publicly available information on Visual Studio 2010 is promising in this regard. The rewrite of the GUI using WPF promises to make hooking into the editor interfaces much cleaner. The super-rich ecosystem that's built up around Java, especially through and for Eclipse makes me really envious. Eclipse is infinitely pluggable so the hordes of Java developers who exist are constantly adding great new free features for the community to use. Of course, I'm not saying that if Visual Studio had the kind of extensibility that Eclipse has that awesome companies like JetBrains should be put out of business by the emergence of many free alternatives. ReSharper is the market leader and I expect that they would continue to do well, even in an environment where "average" developers can extend the IDE without so much intimate knowledge of the compilers, debuggers and editors.

Remember, if you respond to this post, it may not appear immediately. Because of link spammers, I have to approve every comment. Thanks for your patience on that.


Categories: Rant
Posted by kevin on Friday, April 03, 2009 7:55 PM
Permalink | Comments (0) | Post RSSRSS comment feed