Nathan Evans' Nemesis of the Moment

Super fast way to extract width/height dimensions of PNG and JPEG images

Posted in .NET Framework, F# by Nathan B. Evans on April 17, 2015

Turns out that getting the width and height of an image file can be quite tricky if you don’t want to read the whole file into memory. The System.Drawing.Image type in .NET will read it all into memory; not good.

So I read the PNG and JPEG specifications and came up with this.

PNG was easy, as that has a static file header where the width and height are always stored in the same place. But JPEG is far far trickier since the header is built up of segments which are not in any particular order. There can be dozens of these header segments. There is always a particular segment called a Start Of Frame (SOF) which is the one that contains the width/height.

I’ve tried to build this as robustly and defensively as possible since I intend to use on both the server-side and on low memory mobile devices. It is also good at detecting invalid or malformed files and failing fast on those conditions.

It supports big and little endian architectures. And it supports memory streams, file streams and network streams i.e. both seekable and unseekable streams.

The JPEG implementation uses a little mutable state for performance and memory conservation reasons.

ImageSize for F#

/// Provides services to extract header or metadata information from image files of supported types.
module nbevans.Util.ImageSize =
    type DimensionsResult =
    | Success of Width : int * Height : int
    | Malformed of Information : string
    | NotSupported

    module private Utils =
        let ntoh buffer index length =
            if BitConverter.IsLittleEndian then Array.Reverse(buffer, index, length)

        let ntoh_int32 buffer index =
            BitConverter.ToInt32(ntoh buffer index 4, index)

        let ntoh_int16 buffer index =
            BitConverter.ToInt16(ntoh buffer index 2, index)

        type Stream with
            /// Advances the position of the stream by seeking by a specified offset from the current position.
            /// Unlike Seek(), this method is safe for network streams and other types of stream where seeking is not possible.
            /// For these such "unseekable" streams, data will be read instead and immediately discarded.
            member stream.Advance(offset) =
                if stream.CanSeek then
                    stream.Seek(int64 offset, SeekOrigin.Current)
                    let buffer = Array.zeroCreate offset
                    stream.Read(buffer, 0, offset) |> ignore

    module Png =
        /// Gets the width & height dimensions of a PNG image.
        let dimensions (sourceStream:Stream) =
            let signature = Array.zeroCreate 8
            if sourceStream.Read(signature, 0, 8) <> 8 || signature <> [| 137uy; 80uy; 78uy; 71uy; 13uy; 10uy; 26uy; 10uy |] then
                let chunk = Array.zeroCreate 8
                if sourceStream.Advance(8) <> 16L || sourceStream.Read(chunk, 0, 8) <> 8 then
                    Malformed "Expected chunk is not present."
                    Success (ntoh_int32 chunk 0, ntoh_int32 chunk 4)

    module Jpeg =
        module private Markers =
            // All data markers are 2 bytes, where the first byte is a 0xFF prefix.
            let Prefix = 0xFFuy
            // The first data marker (i.e. first 2 bytes of the file) of every JPEG is this.
            let SOI_StartOfImage = 0xD8uy
            // JPEG has lots of different internal encoding types, which are indicated with a SOF data marker.
            // There are many like baseline, progressive, sequential, differential and various combinations of these too.
            // Fortunately the width/height is present in the same position of all of these SOF headers.
            let SOFn_StartOfFrame = [| 0xC0uy; 0xC1uy; 0xC2uy; 0xC3uy; 0xC5uy; 0xC6uy; 0xC7uy; 0xC9uy; 0xCAuy; 0xCBuy; 0xCDuy; 0xCEuy; 0xCFuy |]
            let SOS_StartOfScan = 0xDAuy

        /// Gets the width & height dimensions of a JPEG image.
        let dimensions (sourceStream:Stream) =
            let signature = Array.zeroCreate 2
            if sourceStream.Read(signature, 0, 2) <> 2 || signature.[0] <> Prefix || signature.[1] <> SOI_StartOfImage then
                let mutable result = Option<DimensionsResult>.None
                let marker = Array.zeroCreate 4

                while result.IsNone do
                    result <-
                        if sourceStream.Read(marker, 0, 4) <> 4 then
                            Some <| Malformed "Next data marker header cannot be read."
                            if marker.[0] = Prefix && SOFn_StartOfFrame |> Array.exists ((=) marker.[1]) then
                                // Reuse the marker array as a new buffer, skip over the first byte in the payload (which contains "sample precision"),
                                // and read the 4 bytes that contain two 16-bit values of the width and height, respectively.
                                let buffer = marker
                                sourceStream.Advance(1) |> ignore
                                if sourceStream.Read(buffer, 0, 4) <> 4 then
                                    Some <| Malformed "SOF data marker payload cannot be read."
                                    let lines = int <| ntoh_int16 buffer 0
                                    let samplesPerLine = int <| ntoh_int16 buffer 2
                                    Some <| Success (samplesPerLine, lines)

                            else if marker.[0] = Prefix && marker.[1] = SOS_StartOfScan then
                                // If we've reached the SOS marker then we missed the SOF marker.
                                // That's pretty bizarre and suggests a corrupt JPEG, or at least an unsupported SOF marker.
                                Some <| Malformed "SOS data marker was encountered prematurely."

                            else if marker.[0] <> Prefix then
                                // All data markers identifiers are 2 bytes and the first byte must be 0xFF.
                                Some <| Malformed "Next data marker header is malformed."

                                // After the data marker identifier is a 2 byte length (inclusive) of the payload.
                                // We need this to let us skip over the markers/payloads that are not interesting.
                                let length = (int <| ntoh_int16 marker 2) - 2
                                sourceStream.Advance(length) |> ignore

                defaultArg result (Malformed "End of data markers encountered prematurely.")
Tagged with: , , , , ,

Super skinny XML document generation with F#

Posted in .NET Framework, F# by Nathan B. Evans on April 15, 2015

I needed to generate some simple XML documents, in memory, from some F# script. From my C# days I was already familiar with the System.Xml.Linq namespace, which I still quite like. But it wasn’t particularly clean to use from F#. So I wrote a really simple F# wrapper for some of its most commonly used features.

XmlToolkit for F#

module nbevans.Util.XmlToolkit
open System.Text
open System.Xml
open System.Xml.Linq
open System.IO

let XDeclaration version encoding standalone = XDeclaration(version, encoding, standalone)
let XLocalName localName namespaceName = XName.Get(localName, namespaceName)
let XName expandedName = XName.Get(expandedName)
let XDocument xdecl content = XDocument(xdecl, content |> (fun v -> v :> obj) |> Seq.toArray)
let XComment (value:string) = XComment(value) :> obj
let XElementNS localName namespaceName content = XElement(XLocalName localName namespaceName, content |> (fun v -> v :> obj) |> Seq.toArray) :> obj
let XElement expandedName content = XElement(XName expandedName, content |> (fun v -> v :> obj) |> Seq.toArray) :> obj
let XAttributeNS localName namespaceName value = XAttribute(XLocalName localName namespaceName, value) :> obj
let XAttribute expandedName value = XAttribute(XName expandedName, value) :> obj

type XDocument with
    /// Saves the XML document to a MemoryStream using UTF-8 encoding, indentation and character checking.
    member doc.Save() =
        let ms = new MemoryStream()
        use xtw = XmlWriter.Create(ms, XmlWriterSettings(Encoding = Encoding.UTF8, Indent = true, CheckCharacters = true))
        ms.Position <- 0L

The principle of this module is that it overrides the key type names like System.Xml.Linq.XElement and the others with F# functions that effectively provide the equivalent constructor behaviour but in a more functional signature. Then a XDocument type extension adds a useful Save() function (since the stock ones are so useless on their own). … and here is an example usage straight from my app (but hopefully you will get the idea):

let doc =
    XDocument (XDeclaration "1.0" "UTF-8" "yes") [
        XComment "This document was automatically generated by a configuration script."
        XElement "Metadata" [
            XElement "SystemMetadata" [
                XElement "ScannedBy" ["PCT"]
                XElement "GenerationDate" [DateTime.UtcNow.ToString("s")]
                XElement "IndexedBy" ["UNKNOWN"]
                XElement "IndexedOn" ["UNKNOWN"]
                XElement "FileName" [createPackageContentFileName cp.Id fileName]
                XElement "ScanInfo" [
                    XElement "NumberOfPagesScanned" [string formPdfPageCount]
                    XElement "IpAddress" ["UNKNOWN"]
                    XElement "MachineName" ["UNKNOWN"]
                    XElement "NumberOfBlankPages" ["0"]
            XElement "UserDefinedMetadata" [
                XElement "Address1" [defaultArg (gp.Fields.TryFind "property-address") "UNKNOWN"]
                XElement "Postcode" [defaultArg (gp.Fields.TryFind "property-postcode") "UNKNOWN"]
                XElement "Patchcode" ["1"]
                XElement "Reviewdate" [DateTime.UtcNow.AddYears(1).ToString("s")]

let ms = doc.Save()
// 'ms' at this point contains a System.IO.MemoryStream of the generated XML document.
// That was my use-case, but maybe you will want to adapt this code or the XmlToolkit module itself to use a different type of stream; perhaps a FileStream. I'm a firm believer in K.I.S.S to avoid over-engineering.
Tagged with: , , ,

A simple stereotypical JavaScript-like “debounce” service for F#

Posted in .NET Framework, F#, Xamarin by Nathan B. Evans on August 9, 2014

I needed a simple way to throttle / debounce a text box on Xamarin.Forms to protect against fast user input events. It is basically a text “Entry” control linked to a “ListView” that displays somewhat expensive to compute results from a SQLite database. When a user is typing in their query it would cause lots of concurrent search queries to be fired off which of course just meant all the queries would run slower and then the results from just one would be used anyway.

So I took a look at the Reactive Extensions library but decided it was a bit too heavy weight for a simple Xamarin mobile app. I tried to find something simpler but came up short. So I took a look at the LoDash.js and Underscore.js libraries to see how these did it but they were littered with mutable and global state – *urgh*.

So I wrote my own using a simple F# agent (MailboxProcessor) and encapsulated it in a type to protect against improper use.

/// Provides a stereotypical JavaScript-like "debounce" service for events.
/// Set initialBounce to true to cause a inject a bounce when first the debouncer is first constructed.
type Debounce(timeout, initialBounce, fn) as self =
    let debounce fn timeout = MailboxProcessor.Start(fun agent -> 
        let rec loop ida idb = async {
            let! r = agent.TryReceive(timeout)
            match r with
            | Some _ -> return! loop ida (idb + 1)
            | None when ida <> idb -> fn (); return! loop idb idb
            | None -> return! loop ida idb
        loop 0 0)

    let mailbox = debounce fn timeout
    do if initialBounce then self.Bounce()

    /// Calls the function, after debouncing has been applied.
    member __.Bounce() = mailbox.Post(null)

Really simple way to split a F# sequence into chunks / partitions

Posted in .NET Framework, F# by Nathan B. Evans on March 13, 2014

I needed a simple function to split a (potentially infinite) sequence into chunks, suitable for processing. My exact use-case for this was actually in optimising my Azure blob storage uploads. I would split a sequence of 1,000s of items into batches of 60 or so items and then upload them concurrently across 60 connections to the Azure blob store. The performance benefits from this (after also messing around with ServicePointManager’s stupid connection limits and Nagle algorithm stuff) were simply staggering but that’s kind of another story.

I searched high and low for a suitable F# function to do this, but there was nothing. And all the samples I found on the web had design flaws or were overly complex. The design flaws were usually that it would seek the sequence more than once which is highly inefficient and could even cause side affects depending upon the source of the sequence.

I got frustrated and quickly wrote my own, though I will warn you that it uses mutable state. But as a result is very fast…

/// Returns a sequence that yields chunks of length n.
/// Each chunk is returned as an array.
let toChunks n (s:seq<'t>) = seq {
    let pos = ref 0
    let buffer = Array.zeroCreate<'t> n

    for x in s do
        buffer.[!pos] <- x
        if !pos = n - 1 then
            yield buffer |> Array.copy
            pos := 0
            incr pos

    if !pos > 0 then
        yield Array.sub buffer 0 !pos

// Ridiculously imperative, but it works and is performant; won't seek the sequence more than once.
// If you're using in a forward-only manner and won't be holding references to the returned chunks
// then you can get rid of the Array.copy to gain some extra perf and reduce GC.

Here’s the Gist if that’s better for you:

Oh, this is MIT licensed so you know that I won’t come calling.


Tagged with: , ,

Quick and dirty literal port of my PBKDF2 password hash function from C# to F#

Posted in .NET Framework, F# by Nathan B. Evans on March 13, 2014

Now that I’m fully on board the F# bandwagon I’ve found myself wanting to refactor some of my old utility functions that I’ve had for years in C# land. Sure, I could just reference my C# assemblies, and probably should have. But there’s something nice about porting some code over to your shiny new language, if only just as a learning exercise.

module Crypto.Pbkdf2
open System
open System.Security.Cryptography

let private subkeyLength = 32
let private saltSize = 16

/// Hashes a password by a specified number of iterations using the PBKDF2 crypto function.
let hash password iterations =
    use algo = new Rfc2898DeriveBytes(password, saltSize, iterations)
    let salt = algo.Salt
    let bytes = algo.GetBytes(subkeyLength)

    let iters = if BitConverter.IsLittleEndian then BitConverter.GetBytes(iterations) else BitConverter.GetBytes(iterations) |> Array.rev

    let parts = Array.zeroCreate<byte> 54
    Buffer.BlockCopy(salt, 0, parts, 1, saltSize)
    Buffer.BlockCopy(bytes, 0, parts, 17, subkeyLength)
    Buffer.BlockCopy(iters, 0, parts, 50, sizeof<int>)


/// Hashes a password using 10,000 iterations of the PBKDF2 crypto function.
let fastHash password = hash password 10000

/// Hashes a password using 100,000 iterations of the PBKDF2 crypto function.
let strongHash password = hash password 100000

/// Hashes a password using 300,000 iterations of the PBKDF2 crypto function.
let uberHash password = hash password 300000

/// Verifies a PBKDF2 hashed password with a candidate password.
/// Returns true if the candidate password is correct.
/// The hashed password must have been originally generated by one of the hash functions within this module.
let verify hashedPassword (password:string) =
    let parts = Convert.FromBase64String(hashedPassword)
    if parts.Length <> 54 || parts.[0] <> byte 0 then
        let salt = Array.zeroCreate<byte> saltSize
        Buffer.BlockCopy(parts, 1, salt, 0, saltSize)

        let bytes = Array.zeroCreate<byte> subkeyLength
        Buffer.BlockCopy(parts, 17, bytes, 0, subkeyLength)

        let iters = Array.zeroCreate<byte> sizeof<int>
        Buffer.BlockCopy(parts, 50, iters, 0, sizeof<int>);

        let iters = if BitConverter.IsLittleEndian then iters else iters |> Array.rev

        let iterations = BitConverter.ToInt32(iters, 0)

        use algo = new Rfc2898DeriveBytes(password, salt, iterations)
        let challengeBytes = algo.GetBytes(32)

        match Seq.compareWith (fun a b -> if a = b then 0 else 1) bytes challengeBytes with
        | v when v = 0 -> true
        | _ -> false

Here’s the Gist, if it’s better for you:

Sure the code is quite imperative in style, but it is just a utility function and I literally did a “one pass” refactor from the C# code. It’s not really worth giving a second pass just for the sake of making it more pure functional.

This is MIT licensed by the way. I ain’t going to come calling.


Tagged with: , , ,

Point-to-site (P2S) Azure VPN

Posted in Azure, Windows Environment by Nathan B. Evans on March 1, 2014

It seems there’s still some bugs to be worked out by the Azure guys with this point-to-site Azure VPN feature.

I have been wanting a secure way to access my Azure virtual machines for some time and I only just noticed they added this feature (still in Preview) a few months back. So I went about setting it up.

I had seemingly followed all the official guides. Got past all the the hurdles for the certificate creation stuff using the makecert tool (why doesn’t Azure offer to do all this for you? It’s not like everybody that uses Azure has a Visual Studio command shell installed on their PC!)

I then downloaded the 64-bit VPN Package for my Virtual Network. Installed it. Tried to connect and it was throwing this bizarre error:

A certificate chain processed, but terminated in a root certificate which is not trusted by the trust provider.

After some playing around it turned out that the VPN Package installer has a serious bug in it whereby it simply doesn’t install the Azure Gateway’s certificate into your certificate store. Luckily I had a copy of WinRAR on my machine and so I extracted the installer to take a peek inside and sure enough it contains a .cer certificate file. So I did WinKey+R, mmc, and added a Certificates snap-in for the Local Machine (not Current User!). Then navigated to the Trusted Root Certificates Authorities, right-click and choose the Import task. Find the .cer file you extracted from the VPN Package installer and install it.

Now retry the Azure VPN Connection and the error should go away and you’ll log straight in!

Tagged with: ,

Xamarin/MvvmCross + Autofac

Posted in .NET Framework, Software Design, Xamarin by Nathan B. Evans on February 17, 2014

Recently I’ve been doing some Xamarin development and naturally once a mobile app project reaches a certain size you need to factor it away from just a hacky prototype app towards a sustainable design that ticks all the SOLID boxes.

I researched a handful of various approaches to adopting the MVVM pattern, which included:

  • MvvmCross (OSS, also known as “Mvx”)
  • ReactiveUI (OSS, built on top of the excellent Reactive Extensions / RX library)
  • Crosslight¬†(commercial)

At this time the project didn’t warrant spending $999 on Crosslight (and I only assume it must be very good at that price). So I veered towards an OSS solution. ReactiveUI is by far the more elegantly designed when compared to MvvmCross. However its mobile platform support is relatively new and very focused on solving only the MVVM problem. MvvmCross however is more of a framework that helps you in a number of different ways concerning mobile app development, including providing a dependency injection / IOC layer, and numerous platform-specific extensions for Android, iOS etc. MvvmCross is however, in my opinion, a little “rough around the edges” and most definitely has been put together more like a Panzer Tank than a Lotus Elise.

Ultimately I adopted MvvmCross though as it has the greatest level of momentum in the Xamarin ecosystem and this is important I feel.

One big issue with MvvmCross is that it seems to take on the responsibility of providing a really rather crap IOC container implementation. I’m not sure why it doesn’t just depend upon Autofac or TinyIoc or something like that as it seems like 30% of the code in the MvvmCross codebase could be stripped out if it just farmed out that responsibility to another OSS project. Literally everywhere you look in the MvvmCross codebase there are “factory” and “registry” and, uh, “singleton” components everywhere. Maybe Autofac has spoilt me over the years but I honestly can’t remember the last time I had to “hand roll” such a boilerplate component.

So I set about solving this problem by writing a Autofac adapter for MvvmCross. It turned out to be a lot simpler than I first thought, after working through various nuances of MvvmCross.


I chose to place this type in a separate assembly intended for all my “Autofac for MvvmCross” related extensions. It is a PCL assembly, since Autofac is fully PCL compatible, even with Xamarin.

public class AutofacMvxIocProvider : MvxSingleton<IMvxIoCProvider>, IMvxIoCProvider {
    private readonly IContainer _container;

    public AutofacMvxIocProvider(IContainer container) {
        if (container == null) throw new ArgumentNullException("container");
        _container = container;

    public bool CanResolve<T>() where T : class {
        return _container.IsRegistered<T>();

    public bool CanResolve(Type type) {
        return _container.IsRegistered(type);

    public T Resolve<T>() where T : class {
        return (T)Resolve(typeof(T));

    public object Resolve(Type type) {
        return _container.Resolve(type);

    public T Create<T>() where T : class {
        return Resolve<T>();

    public object Create(Type type) {
        return Resolve(type);

    public T GetSingleton<T>() where T : class {
        return Resolve<T>();

    public object GetSingleton(Type type) {
        return Resolve(type);

    public bool TryResolve<T>(out T resolved) where T : class {
        return _container.TryResolve(out resolved);

    public bool TryResolve(Type type, out object resolved) {
        return _container.TryResolve(type, out resolved);

    public void RegisterType<TFrom, TTo>()
        where TFrom : class
        where TTo : class, TFrom {

        var cb = new ContainerBuilder();

    public void RegisterType(Type tFrom, Type tTo) {
        var cb = new ContainerBuilder();

    public void RegisterSingleton<TInterface>(TInterface theObject) where TInterface : class {
        var cb = new ContainerBuilder();

    public void RegisterSingleton(Type tInterface, object theObject) {
        var cb = new ContainerBuilder();

    public void RegisterSingleton<TInterface>(Func<TInterface> theConstructor) where TInterface : class {
        var cb = new ContainerBuilder();
        cb.Register(cc => theConstructor()).As<TInterface>().AsSelf().SingleInstance();

    public void RegisterSingleton(Type tInterface, Func<object> theConstructor) {
        var cb = new ContainerBuilder();
        cb.Register(cc => theConstructor()).As(tInterface).AsSelf().SingleInstance();

    public T IoCConstruct<T>() where T : class {
        return (T)IoCConstruct(typeof(T));

    public object IoCConstruct(Type type) {
        return Resolve(type);

    public void CallbackWhenRegistered<T>(Action action) {
        CallbackWhenRegistered(typeof(T), action);

    public void CallbackWhenRegistered(Type type, Action action) {
        _container.ComponentRegistry.Registered += (sender, args) => {
            if (args.ComponentRegistration.Services.OfType<TypedService>().Any(x => x.ServiceType == type)) {


I’m just showing my MvxAndroidSetup implementation here, but your iOS and Windows Phone etc would obviously look basically the same.

public class Setup : MvxAndroidSetup {
    private static Assembly CoreAssembly { get { return typeof(App).Assembly; } }

    public Setup(Context applicationContext) : base(applicationContext) { }

    protected override IMvxApplication CreateApp() {
        return new App();

    protected override IMvxIoCProvider CreateIocProvider() {
        var cb = new ContainerBuilder();

        // I like to structure my app using Autofac modules.
        // It keeps everything very DRY and SRP compliant.
        // Ideally, these Autofac modules would be held in a separate PCL so they can be used
        // by Android / iOS / WP platforms without violating DRY.

        // This is an important step that ensures all the ViewModel's are loaded into the container.
        // Without this, it was observed that MvvmCross wouldn't register them by itself; needs more investigation.
            .As<IMvxViewModel, MvxViewModel>()

        return new AutofacMvxIocProvider(cb.Build());


This enables me to use Autofac unhindered from my Xamarin mobile apps. It allows the codebase to remain consistent by only using one IOC container, which helps minimise complexity, encourages more DRY code and in the future would lower the barriers to getting more developers up to speed with the whole codebase. Autofac is by far the best IOC container available for .NET and having it available for using in Xamarin when coupled with MvvmCross provides a major improvement in productivity for me.

IoC containers: where you define the seams of applications

Posted in .NET Framework, Software Design by Nathan B. Evans on April 10, 2013

A few colleagues asked me to do a quick write up about the proper use of a IoC container. Particularly concerning what types you DO and DON’T register into the container. So here we go:

Things that you do and don’t wire up into an IoC container.

The big ones, the seams of the application

Components that are inherently cross-cutting concerns, and need to be “available everywhere” for possible injection. Things like:

  • Logging, tracing and instrumentation
  • Authentication and authorization
  • Configuration
  • Major application services (this includes things like the Controllers in a MVC web app)

Components that will be modularised as plug-ins / add-ins, things that get loaded dynamically. Consider using MEF as the discovery mechanism of these components.

Services with multiple implementations that can be “dynamically selected” through some means (app.config, differing registrations per DEBUG and RELEASE modes at compile-time, per-tenant configuration, etc.)

The little ones, the stylistic ones and where you “lean” on the power of your container to provide infrastructure services or as a development aid

Components that require lifetime scoping or management (transactions, sessions, units of work) and other IDisposable-like things that are longer lived than just a one-off use.

Components that are single instance. Never write “static” components.

Components that require testing / mocking out, etc. Note: I consider this to be a “development aid” and not at all mandatory.

When you want an “automatic factory” (Autofac isn’t called that for no reason!). A simple inline Func<ISomeService> expression is cleaner than a going down the stereotypical Java “Enterprise” route of manually rolling out a SomeServiceFactory class each time. Though that’s more as a result of the sad fact that they still don’t have lambdas.

And now the things that you leave out of the container

Anything that is never, and never has any need to be, referenced outside out of the module it is within.

Implementation details of a module. Your container registrations should be the facade that hides the complexities of how that module works.

Things that are essentially just DTOs, entities, POCOs, other dumb types, etc.

Little utility, helper functions.

Note: I refer to “module” a few times. This is in no way in direct reference to an assembly or package. It’s more in reference to a namespace, because components typically reside within a relatively self contained namespace with a container registration module.

Cardinal rules

Never ever call a “static” Kernel.Get / Resolve, or whatever equivalent your container might expose, anywhere. This is not dependency injection. It is service location. Which is a whole different pattern entirely. Autofac is quite neat in that it’s one of very few containers that actually does not, out of the box, provide any sort of “static” resolution/service location function. And that is good.

Only call Get / Resolve methods in your bootstrap code at the root of your object graph. And even then, there should only be less than a handful of such calls. If you can get it down to just one, then you’ve done well and you probably have an object graph that is very well expressed.

Always keep the object graph in the back of your mind. It’s a shame, in my opinion, that containers tend to keep this information hidden away in their internals. The only time you get a glimpse of it is in the exception message for when you’ve inadvertently introduced a circular dependency. Things could be so much better than this.

If you have a component that’s requiring injection of more than about five dependencies, then it should start coming onto your “radar of suspiciousness”. If it reaches about eight to nine dependencies you should almost certainly consider refactoring it and, probably, the wider namespace or module as a whole. I often see this happen on Controllers in MVC applications; the so called “fat controller” problem. Thankfully, because the dependencies are already “well expressed” (it’s just that there is too many of them) then normally refactoring such problem areas of the codebase is a relatively straight forward task.

Nowhere except your bootstrapper and container modules should reference the container, i.e. its namespaces. Arguably, your bootstrapper and container modules can be in a totally separate assembly by themselves and only that assembly holds references to your container’s assemblies. If you’re seeing namespace imports for your container all over your projects then something is very badly wrong.

Avoid the use of “service locator injections”, such as IComponentContext in Autofac. This is one of the very few ways that Autofac supports to allow you to shoot yourself in the foot. It’s not quite as bad as a “static” Kernel.Get style service locator, but it’s still pretty damn bad. As it implies you don’t actually know what possible dependencies your component has, which should be impossible. To avoid this, express your dependencies better. If there are multiple instances you wish to dynamically “select” from at runtime then you can roll your own resolution delegate function and lean on your container to implement it. Autofac makes this very easy using its IIndex relationship. For example:

public delegate IMyService MyServiceResolver(string name);

// ... this stuff below goes in your container module ...
Func<IComponentContext, MyServiceResolver> tmp = c => {
    var indexedServices = c.Resolve<IIndex<string, IMyService>>();
    return name => indexedServices [name];


builder.Register(c => new MyService())
       // The "keyed on" value is a string in this example.
       // But, usefully, it can be any object including value types such as an enum.

// ... any time I want to resolve a IMyService, I can just do this in a constructor:
class SomeOtherComponent {
    private readonly IMyService myService;
    public SomeOtherComponent(MyServiceResolver myServiceResolver) {
        if (myServiceResolver == null)
            throw new ArgumentNullException("myServiceResolver");
        this.myService = myServiceResolver("Fred");
        // Technically this is a form of service location.
        // However, because we have constrained the number of services that
        // can be resolved to a particular *type*; then this does not
        // introduce any bad practices to the codebase.
        // Most importantly, we are not relying on any "static" magic.
        // (Which is the absolute hallmark of truly bad service location.)
        // Nor are we holding any references to the container.

Example of a Bootstrapper, Container Module and general structure of your Program Root

This is a little snippet of a relatively well structured IoC server application. I added some relevant comments to it.

public class Program {
    private static IContainer Container { get; set; }
    private static ILog Log { get; set; }
    private static ProgramOptions Options { get; set; }
    private static Lazy<HostEntryPoint> Host { get; set; }

    public static void Main(string[] args) {
        try {
            Options = new ProgramOptions();
            if (!Parser.Default.ParseArguments(args, Options))

            // Root of the program.
            // Bootstraps the container then resolves two components.
            // One for logging services in the root (this) and the other
            // is the *actual* entry point of the application.
            Container = new Bootstrapper().CreateContainer();
            Log = Container.Resolve<ILog>(TypedParameter.From("boot"));
            Host = Container.Resolve<Lazy<HostEntryPoint>>();

            if (Options.RunAsService)

        } catch (Exception x) {
            // Arguably one of the very few places catching a plain
            // Exception can make sense: at the root of the program.
            Log.FatalException("Unexpected error occurred whilst starting.", x);

    // ... cut for brevity ...

internal class Bootstrapper {
    public virtual IEnumerable<IModule> GetModules() {
        yield return new LoggingModule {
            Console = LoggingMode.TraceOrAbove,
            File = LoggingMode.WarningOrAbove,
            Debug = LoggingMode.Off,
            RegisterNetFxTraceListener = true
            // Container Modules are an excellent place to pass in
            // certain configuration/runtime parameters and options.
            // I prefer to "hard code" things like this until there
            // is a *real* need to expose such things to a config file,
            // and hence the user of the application.

        // These modules can be specified in any order.
        // Container will resolve the object graph at
        // build-time not at registration-time.

        yield return new QueuesModule() { ConcurrentReceivers = 4 };
        yield return new DispatchersModule();
        yield return new HciCommandsModule();
        yield return new MefModulesModule();
        yield return new AzureDataModule() { ConnectionString = "<goes here>"};
    public virtual void RegisterCore(ContainerBuilder builder) {
        builder.Register(c =>
                         new HostEntryPoint(
        // As well as typically only ever using constructor injection...
        // I prefer to explicitly define the dependency resolutions here, each time.
        // That is, in my opinion, half the point of IoC. You're doing it to keep
        // very close tabs on your dependency graphs. So it should certainly not
        // be the norm that you let the container resolve them through its automagicness.
        // An exception to this rule is dynamically loaded modules (such as MEF assemblies)
        // where you cannot possibly know, at compile-time, what dependencies are required.

    public IContainer CreateContainer() {
        var builder = new ContainerBuilder();

        foreach (var module in GetModules())


        return builder.Build();

I’m open to feedback and discussionūüôā

Azure Table Storage versus SQLite (on Azure Blob Store)

Posted in .NET Framework, Distributed Systems, Uncategorized by Nathan B. Evans on March 31, 2013

I’ve been trying to decide what storage platform I should use for my application.

It will be storing what are essentially ever-growing (but potentially prune-able past a certain age, say 1 to 3 years) transaction logs. Each record consists of four timestamp values (each 64-bits wide), three 32-bit integer values, and three text fields (two of which are generally of constricted length, say a max. of 256 characters) but one of a typically longer length but hopefully not more than about 1KB at worst case.

Having tried out SQLite on my local machine (which has an SSD), I managed to insert 60,000 of these records in about 1 second flat. I was impressed but cautious, because SQLite isn’t really a cloud-ready data store and it would require quite a bit of work in wrapping up with concurrency handling to make it work for what I’d need it to do. But I could not ignore that it was fast.

When I first read up about Azure Table Storage, I was a bit underwhelmed. It just seemed incredibly bloated and inefficient. It uses XML as its serialization transport. It uses HTTP/S as its network transport (and there is no fast low-level interface available like there is for Azure Service Bus). If you’ve ever used ProtoBuf’s, getting to know Azure Table Storage is a depressing experience. You can see the wastage but there is nothing you can do. Sure you can override the serialization to remove its reliance on reflection and shorten up the property names, but that’s only half the story.

I persisted anyway, and dived into Table Storage to give it a proper go and see what it could do.

I ran into a couple problems, mostly with the .NET Client API. I was submitting a batch of approx. 600 entities. It was returning back to me with a rather vague and puzzling exception:

Microsoft.WindowsAzure.Storage.StorageException was caught
  Message=Unexpected response code for operation : 99
       at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext)
       at Microsoft.WindowsAzure.Storage.Table.TableBatchOperation.Execute(CloudTableClient client, String tableName, TableRequestOptions requestOptions, OperationContext operationContext)
       at Microsoft.WindowsAzure.Storage.Table.CloudTable.ExecuteBatch(TableBatchOperation batch, TableRequestOptions requestOptions, OperationContext operationContext)
       at Tests.WazTableSandbox.Write()

Nothing of any worth showed up on Google about this. I dug into it a bit further and noticed the extended exception information mentioned something about “InvalidInput” and “99:One of the request inputs is not valid.” Not really that useful still. Even Googling these gave me no clues as to what was wrong.

I read somewhere that Azure Table Storage batches are limited to 100 entities per batch. So I wrote a quick LINQ GroupBy to batch up my dataset by partition key (yes, that’s another requirement; batches of operations must all be for the same partition key). Fortunately, the exception went away once I was grouping them into batches of 100 correctly. Surely the .NET Client API deserves a better and more self-explanatory exception message for this edge case though? It’s blatantly going to be the first problem any developer encounters when trying to use CloudTable.ExecuteBatch().

With that solved, I continued with my tests.

My test data was batched up, by partition key, into these batch sizes: 26, 28, 22, 46, 51, 61, 32, 14, 46, 34, 31, 42, 59 and 8.

I then wrote some test code for SQLite that mirrored what I was doing with the Table Storage. I made sure to use a SQLite transaction per batch, so that each batch would be written as an atomic unit. I purposefully gave SQLite an advantage by “preparing” the command (i.e. pre-compiling the byte code for the SQL command).

I deployed my test program onto an Azure VM (“extra small”, if it matters?) and ran it. Here’s what came out:

Executing batch of 26
Executing batch of 28
Executing batch of 22
Executing batch of 46
Executing batch of 51
Executing batch of 61
Executing batch of 32
Executing batch of 14
Executing batch of 46
Executing batch of 34
Executing batch of 31
Executing batch of 42
Executing batch of 59
Executing batch of 8

Executing batch of 26
Executing batch of 28
Executing batch of 22
Executing batch of 46
Executing batch of 51
Executing batch of 61
Executing batch of 32
Executing batch of 14
Executing batch of 46
Executing batch of 34
Executing batch of 31
Executing batch of 42
Executing batch of 59
Executing batch of 8

So although SQLite was massively faster on my local SSD-powered workstation. It was substantially slower (almost 2x) when running from the Azure VM (and hence on a blob store). This was a bit disappointing but it gives me confidence that I am using the right data storage tool for the job.

You may be wondering why I even considered SQLite as an option in the first place. Well, good question. I am still on the fence as to whether my application will be “full cloud” or just a half-way house that can be installed somewhere without any cloudy stuff involved. That’s why I wanted to investigate SQLite as it’s a standalone database. I might support both, in which case I would use SQLite for non-cloud deployments and Azure Table Storage for cloud deployments. I still find it disappointing how inefficient the Azure Table Storage has been designed. They really need to introduce a lower-level network transport like the one for Service Bus. And a better, XML-less, serialization format.

Tagged with: , ,

Three gotchas with the Azure Service Bus

Posted in .NET Framework, Distributed Systems, Software Design by Nathan B. Evans on March 28, 2013

I’ve been writing some fresh code using Azure Service Bus Queues in recent weeks. Overall I’m very impressed. The platform is good, stable and the Client APIs (at least in the form of Microsoft.ServiceBus.dll that I’ve used) is quite modern in design and layout. It’s only slightly annoying that the Client APIs seem to use the old fashioned Begin/End async pattern that was perhaps more in vogue back in the .NET 1.0 to 2.0 days. Why not just return TPL Tasks?

However, there have been a few larger gotchas that I’ve discovered which can quite easily turn into non-trivial problems for a developer to safely work around. These are the sort of problems that can inherently change the way your application is designed.

Deferring messages via Defer()

I’m of the opinion that a Service Bus should take care of message redelivery mechanisms itself. On the most part, Azure Service Bus does this really well. But it supports this slightly bizarre type of return notification called deferral (invoked via a Defer() or BeginDefer() method). This basically sets a flag on the message internally so that it will never be implicitly redelivered by the queue to your application. But the message will fundamentally still exist inside the queue and you can even still Receive() it by asking for it by its SequenceId explicitly. That’s all good and everything but it leaves your application with a bigger problem. Where does it durably store those SequenceId‘s so that it knows what messages it has deferred? Sure you could hold them in-memory, that would be the naive approach and seems to be the approach taken by the majority of Azure books and documentation. But that is, frankly, a ridiculous idea and its insulting that authors in the fault-tolerant distributed systems space can even suggest such rubbish. The second problem is of course what sort of retry strategy do you adopt for that queue of deferred SequenceId‘s. Then you have to think about the transaction costs (i.e. money!) involved of whatever retry strategy you employ. What if your system has deferred hundreds of thousands of millions of messages? Consider that¬†those deferred messages were outbound e-mails and they were being deferred because your mail server is down for 2 hours.¬†If you were to retry those messages every 5 seconds, that is a lot of Service Bus transactions that you’ll get billed for.

One wonders why the Defer() method doesn’t support some sort of time duration or absolute point in time as a parameter that could indicate to the Service Bus when you actually want that message to be redelivered. It would certainly be a great help and I can’t imagine it would require that much work in the back-end for the Azure guys.

So how do you actually solve this problem?

For now, I have completely avoided the use of Defer() in my system. When I need to defer a message I will simply not perform any return notification for the message and I will allow the PeekLock to expire by its own accord (which the Service Bus handles itself). This approach has the following application design side affects:

  • The deferral and retry logic is performed by the Service Bus entirely. My application does not need to worry about such things and the complexities involved.
  • The deferral retry time is constant and is defined at queue description level. It cannot be controlled dynamically on a per message basis.
  • Your queue’s MaxDeliveryCount, LockDuration and DefaultTimeToLive parameters will become inherently coupled and will need to be explicitly controlled.
    (MaxDeliveryCount x LockDuration) will determine how long a message can be retried for and at what interval. If your LockDuration is 1.5 minutes and you want to retry the message for 1 day then MaxDeliveryCount = (1 day / 1.5 minutes) = 960.

This is a good stop-gap measure whilst I am iterating quickly. For small systems it can perhaps even be a permanent solution. But sooner or later it will cause problems for me and will need to be refactored.

I think the key to solving this problem is gaining better understanding over the reason why the message is being deferred in the first place, therefore providing you with more control. In my particular application it can only be caused when for instance an e-mail server is down or unreachable etc. So maybe I need some sort of watchdog in my application that (eventually) detects when the e-mail server is down and then actively stops trying to send messages, and indeed maybe even puts the brakes on actually Receive()‘ing messages from the queue in the first place. For those messages that have been received already then maybe there should be a separate queue called something like “email-outbox-deferred” (note the suffix). Messages queued on this would not actually be the real message but simply a pointer record that points back to the SequenceId of the real one on the “email-outbox” queue. When the watchdog detects that the e-mail server has come back up then it can start opening up the taps again. Firstly it would perform a Receive() loop on the “email-outbox-deferred” queue and attempt to reprocess those messages by following the SequenceId pointers back to the real queue. If it manages to successfully send the e-mail then it can issue a Complete() on both the deferred pointer message and the real message; to entirely remove it from the job queue. Otherwise it can Abandon() them both and the watchdog can start from square one by waiting to gain confidence of the e-mail servers health before retrying again.

The key to this approach is the watchdog. The watchdog must act as a type of valve that can open and close the Receive() loops on the two queues. Without this component you are liable to create long unconstrained loops or even infinite-like loops that will cause you to have potentially massive Service Bus transaction costs on your next bill from Azure.

I believe what I have described here is considered to be a SEDA or “Staged event-driven architecture“. Documentation of this pattern is a bit thin on the ground at the moment. Hopefully this will start to change as enterprise-level PaaS cloud applications gain more and more traction. But if anyone has any good book recommendations… ping me a message.

I’d be interested in learning more about message deferral and retry strategies, so please comment!

Transient fault retry logic is not built into the Client API

Transient faults are those that imply there is probably nothing inherently wrong with your message. It’s just that the Service Bus is perhaps too busy or network conditions dictate that it can’t be handled at this time. Fortunately the Client API includes a nice IsTransient boolean property on every MessagingException. Making good use of this property is harder than it first appears though.

All the Azure documentation that I’ve found makes use of (the rather hideous) Enterprise Library Transient Fault Block pattern. That’s all fine and good. But who honestly wants to be wrapping up every Client API action they do in that? Sure you can abstract it away again by yourself but where does it end?

It seems odd that the Client API doesn’t have this built in. Why when you invoke some operation like Receive() can’t you specify a transient fault retry strategy as an optional parameter? Or hell, why can’t you just specify this retry strategy at a QueueClient level?

I remain hopeful that this is something the Azure guys will fix soon.

Dead lettering is not the end

You may think that once you’ve dead lettered a message that you’ll not need to worry about it again from your application. Wrong.

When you dead letter a message it is actually just moved to a special sub-queue of your queue. If left untouched, it will remain in that sub-queue forever. Forever. Yes, forever. Yes, a memory leak. Eventually this will bring down your application because your queue will run into its memory limit (which can only be a maximum of 5GB). Annoyingly most developers are simply not aware of the dead letter sub-queues existence because it does not show up as a queue on the Server Explorer pane in Visual Studio. Bit of an oversight that one!

Having a human flush this queue out every now and then is not an acceptable solution for most systems. What if your system has a sudden spike in dead letters. Maybe a rogue system was submitting messages to your queues using an old serialization format or something? What if there were millions of these messages? Your application is going to be offline quicker than any human can react. So you need to build this logic into your application itself. This can be done by a watchdog process that keeps track of how many messages are being put onto the dead letter queue and actively ensures it is frequently pruned. This is very much a non-trivial problem.

Alternatively you can avoid the use of dead lettering entirely. This seems drastic but it may not be such a bad idea actually. You should consider if you actually care enough about retaining that carbon-copy of a message to keep it around as-is. Ask yourself whether just some simple and traditional trace/log output of the problem and approximate message content would be sufficient? Dead lettering is inherently a human concept that is analogous to “lost and found” or a “spam folder”. So perhaps with fully automated systems that desire as little human influence or administrative effort as possible then avoiding dead lettering entirely is the best choice.

Tagged with: , , ,