IoC containers: where you define the seams of applications
A few colleagues asked me to do a quick write up about the proper use of a IoC container. Particularly concerning what types you DO and DON’T register into the container. So here we go:
Things that you do and don’t wire up into an IoC container.
The big ones, the seams of the application
Components that are inherently cross-cutting concerns, and need to be “available everywhere” for possible injection. Things like:
- Logging, tracing and instrumentation
- Authentication and authorization
- Configuration
- Major application services (this includes things like the Controllers in a MVC web app)
Components that will be modularised as plug-ins / add-ins, things that get loaded dynamically. Consider using MEF as the discovery mechanism of these components.
Services with multiple implementations that can be “dynamically selected” through some means (app.config, differing registrations per DEBUG and RELEASE modes at compile-time, per-tenant configuration, etc.)
The little ones, the stylistic ones and where you “lean” on the power of your container to provide infrastructure services or as a development aid
Components that require lifetime scoping or management (transactions, sessions, units of work) and other IDisposable-like things that are longer lived than just a one-off use.
Components that are single instance. Never write “static” components.
Components that require testing / mocking out, etc. Note: I consider this to be a “development aid” and not at all mandatory.
When you want an “automatic factory” (Autofac isn’t called that for no reason!). A simple inline Func<ISomeService> expression is cleaner than a going down the stereotypical Java “Enterprise” route of manually rolling out a SomeServiceFactory class each time. Though that’s more as a result of the sad fact that they still don’t have lambdas.
And now the things that you leave out of the container
Anything that is never, and never has any need to be, referenced outside out of the module it is within.
Implementation details of a module. Your container registrations should be the facade that hides the complexities of how that module works.
Things that are essentially just DTOs, entities, POCOs, other dumb types, etc.
Little utility, helper functions.
Note: I refer to “module” a few times. This is in no way in direct reference to an assembly or package. It’s more in reference to a namespace, because components typically reside within a relatively self contained namespace with a container registration module.
Cardinal rules
Never ever call a “static” Kernel.Get / Resolve, or whatever equivalent your container might expose, anywhere. This is not dependency injection. It is service location. Which is a whole different pattern entirely. Autofac is quite neat in that it’s one of very few containers that actually does not, out of the box, provide any sort of “static” resolution/service location function. And that is good.
Only call Get / Resolve methods in your bootstrap code at the root of your object graph. And even then, there should only be less than a handful of such calls. If you can get it down to just one, then you’ve done well and you probably have an object graph that is very well expressed.
Always keep the object graph in the back of your mind. It’s a shame, in my opinion, that containers tend to keep this information hidden away in their internals. The only time you get a glimpse of it is in the exception message for when you’ve inadvertently introduced a circular dependency. Things could be so much better than this.
If you have a component that’s requiring injection of more than about five dependencies, then it should start coming onto your “radar of suspiciousness”. If it reaches about eight to nine dependencies you should almost certainly consider refactoring it and, probably, the wider namespace or module as a whole. I often see this happen on Controllers in MVC applications; the so called “fat controller” problem. Thankfully, because the dependencies are already “well expressed” (it’s just that there is too many of them) then normally refactoring such problem areas of the codebase is a relatively straight forward task.
Nowhere except your bootstrapper and container modules should reference the container, i.e. its namespaces. Arguably, your bootstrapper and container modules can be in a totally separate assembly by themselves and only that assembly holds references to your container’s assemblies. If you’re seeing namespace imports for your container all over your projects then something is very badly wrong.
Avoid the use of “service locator injections”, such as IComponentContext in Autofac. This is one of the very few ways that Autofac supports to allow you to shoot yourself in the foot. It’s not quite as bad as a “static” Kernel.Get style service locator, but it’s still pretty damn bad. As it implies you don’t actually know what possible dependencies your component has, which should be impossible. To avoid this, express your dependencies better. If there are multiple instances you wish to dynamically “select” from at runtime then you can roll your own resolution delegate function and lean on your container to implement it. Autofac makes this very easy using its IIndex relationship. For example:
public delegate IMyService MyServiceResolver(string name);
// ... this stuff below goes in your container module ...
Func<IComponentContext, MyServiceResolver> tmp = c => {
var indexedServices = c.Resolve<IIndex<string, IMyService>>();
return name => indexedServices [name];
};
builder.Register(tmp)
.As<MyServiceResolver>()
builder.Register(c => new MyService())
.As<IMyService>()
.Keyed<IMyService>(serviceDescription.Name);
// The "keyed on" value is a string in this example.
// But, usefully, it can be any object including value types such as an enum.
// ... any time I want to resolve a IMyService, I can just do this in a constructor:
class SomeOtherComponent {
private readonly IMyService myService;
public SomeOtherComponent(MyServiceResolver myServiceResolver) {
if (myServiceResolver == null)
throw new ArgumentNullException("myServiceResolver");
this.myService = myServiceResolver("Fred");
// Technically this is a form of service location.
// However, because we have constrained the number of services that
// can be resolved to a particular *type*; then this does not
// introduce any bad practices to the codebase.
//
// Most importantly, we are not relying on any "static" magic.
// (Which is the absolute hallmark of truly bad service location.)
// Nor are we holding any references to the container.
}
}
Example of a Bootstrapper, Container Module and general structure of your Program Root
This is a little snippet of a relatively well structured IoC server application. I added some relevant comments to it.
public class Program {
private static IContainer Container { get; set; }
private static ILog Log { get; set; }
private static ProgramOptions Options { get; set; }
private static Lazy<HostEntryPoint> Host { get; set; }
public static void Main(string[] args) {
try {
Options = new ProgramOptions();
if (!Parser.Default.ParseArguments(args, Options))
return;
// Root of the program.
// Bootstraps the container then resolves two components.
// One for logging services in the root (this) and the other
// is the *actual* entry point of the application.
Container = new Bootstrapper().CreateContainer();
Log = Container.Resolve<ILog>(TypedParameter.From("boot"));
Host = Container.Resolve<Lazy<HostEntryPoint>>();
if (Options.RunAsService)
RunAsService();
else
RunAsConsole();
} catch (Exception x) {
// Arguably one of the very few places catching a plain
// Exception can make sense: at the root of the program.
Log.FatalException("Unexpected error occurred whilst starting.", x);
Environment.Exit(100);
}
Log.Info("Exiting...");
Container.Dispose();
Thread.Sleep(500);
}
// ... cut for brevity ...
}
internal class Bootstrapper {
public virtual IEnumerable<IModule> GetModules() {
yield return new LoggingModule {
Console = LoggingMode.TraceOrAbove,
File = LoggingMode.WarningOrAbove,
Debug = LoggingMode.Off,
RegisterNetFxTraceListener = true
// Container Modules are an excellent place to pass in
// certain configuration/runtime parameters and options.
// I prefer to "hard code" things like this until there
// is a *real* need to expose such things to a config file,
// and hence the user of the application.
};
// These modules can be specified in any order.
// Container will resolve the object graph at
// build-time not at registration-time.
yield return new QueuesModule() { ConcurrentReceivers = 4 };
yield return new DispatchersModule();
yield return new HciCommandsModule();
yield return new MefModulesModule();
yield return new AzureDataModule() { ConnectionString = "<goes here>"};
}
public virtual void RegisterCore(ContainerBuilder builder) {
builder.Register(c =>
new HostEntryPoint(
c.Resolve<MefModuleLogging>(),
c.Resolve<IEnumerable<IQueueAgent>>(),
c.Resolve<IEnumerable<IDispatcher>>()))
.SingleInstance();
// As well as typically only ever using constructor injection...
// I prefer to explicitly define the dependency resolutions here, each time.
// That is, in my opinion, half the point of IoC. You're doing it to keep
// very close tabs on your dependency graphs. So it should certainly not
// be the norm that you let the container resolve them through its automagicness.
// An exception to this rule is dynamically loaded modules (such as MEF assemblies)
// where you cannot possibly know, at compile-time, what dependencies are required.
}
public IContainer CreateContainer() {
var builder = new ContainerBuilder();
foreach (var module in GetModules())
builder.RegisterModule(module);
RegisterCore(builder);
return builder.Build();
}
}
I’m open to feedback and discussion
Azure Table Storage versus SQLite (on Azure Blob Store)
I’ve been trying to decide what storage platform I should use for my application.
It will be storing what are essentially ever-growing (but potentially prune-able past a certain age, say 1 to 3 years) transaction logs. Each record consists of four timestamp values (each 64-bits wide), three 32-bit integer values, and three text fields (two of which are generally of constricted length, say a max. of 256 characters) but one of a typically longer length but hopefully not more than about 1KB at worst case.
Having tried out SQLite on my local machine (which has an SSD), I managed to insert 60,000 of these records in about 1 second flat. I was impressed but cautious, because SQLite isn’t really a cloud-ready data store and it would require quite a bit of work in wrapping up with concurrency handling to make it work for what I’d need it to do. But I could not ignore that it was fast.
When I first read up about Azure Table Storage, I was a bit underwhelmed. It just seemed incredibly bloated and inefficient. It uses XML as its serialization transport. It uses HTTP/S as its network transport (and there is no fast low-level interface available like there is for Azure Service Bus). If you’ve ever used ProtoBuf’s, getting to know Azure Table Storage is a depressing experience. You can see the wastage but there is nothing you can do. Sure you can override the serialization to remove its reliance on reflection and shorten up the property names, but that’s only half the story.
I persisted anyway, and dived into Table Storage to give it a proper go and see what it could do.
I ran into a couple problems, mostly with the .NET Client API. I was submitting a batch of approx. 600 entities. It was returning back to me with a rather vague and puzzling exception:
Microsoft.WindowsAzure.Storage.StorageException was caught
HResult=-2146233088
Message=Unexpected response code for operation : 99
Source=Microsoft.WindowsAzure.Storage
StackTrace:
at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext)
at Microsoft.WindowsAzure.Storage.Table.TableBatchOperation.Execute(CloudTableClient client, String tableName, TableRequestOptions requestOptions, OperationContext operationContext)
at Microsoft.WindowsAzure.Storage.Table.CloudTable.ExecuteBatch(TableBatchOperation batch, TableRequestOptions requestOptions, OperationContext operationContext)
at Tests.WazTableSandbox.Write()
Nothing of any worth showed up on Google about this. I dug into it a bit further and noticed the extended exception information mentioned something about “InvalidInput” and “99:One of the request inputs is not valid.” Not really that useful still. Even Googling these gave me no clues as to what was wrong.
I read somewhere that Azure Table Storage batches are limited to 100 entities per batch. So I wrote a quick LINQ GroupBy to batch up my dataset by partition key (yes, that’s another requirement; batches of operations must all be for the same partition key). Fortunately, the exception went away once I was grouping them into batches of 100 correctly. Surely the .NET Client API deserves a better and more self-explanatory exception message for this edge case though? It’s blatantly going to be the first problem any developer encounters when trying to use CloudTable.ExecuteBatch().
With that solved, I continued with my tests.
My test data was batched up, by partition key, into these batch sizes: 26, 28, 22, 46, 51, 61, 32, 14, 46, 34, 31, 42, 59 and 8.
I then wrote some test code for SQLite that mirrored what I was doing with the Table Storage. I made sure to use a SQLite transaction per batch, so that each batch would be written as an atomic unit. I purposefully gave SQLite an advantage by “preparing” the command (i.e. pre-compiling the byte code for the SQL command).
I deployed my test program onto an Azure VM (“extra small”, if it matters?) and ran it. Here’s what came out:
WazTable Executing batch of 26 Executing batch of 28 Executing batch of 22 Executing batch of 46 Executing batch of 51 Executing batch of 61 Executing batch of 32 Executing batch of 14 Executing batch of 46 Executing batch of 34 Executing batch of 31 Executing batch of 42 Executing batch of 59 Executing batch of 8 00:00:01.8756798 Sqlite Executing batch of 26 Executing batch of 28 Executing batch of 22 Executing batch of 46 Executing batch of 51 Executing batch of 61 Executing batch of 32 Executing batch of 14 Executing batch of 46 Executing batch of 34 Executing batch of 31 Executing batch of 42 Executing batch of 59 Executing batch of 8 00:00:03.4291801
So although SQLite was massively faster on my local SSD-powered workstation. It was substantially slower (almost 2x) when running from the Azure VM (and hence on a blob store). This was a bit disappointing but it gives me confidence that I am using the right data storage tool for the job.
You may be wondering why I even considered SQLite as an option in the first place. Well, good question. I am still on the fence as to whether my application will be “full cloud” or just a half-way house that can be installed somewhere without any cloudy stuff involved. That’s why I wanted to investigate SQLite as it’s a standalone database. I might support both, in which case I would use SQLite for non-cloud deployments and Azure Table Storage for cloud deployments. I still find it disappointing how inefficient the Azure Table Storage has been designed. They really need to introduce a lower-level network transport like the one for Service Bus. And a better, XML-less, serialization format.
Three gotchas with the Azure Service Bus
I’ve been writing some fresh code using Azure Service Bus Queues in recent weeks. Overall I’m very impressed. The platform is good, stable and the Client APIs (at least in the form of Microsoft.ServiceBus.dll that I’ve used) is quite modern in design and layout. It’s only slightly annoying that the Client APIs seem to use the old fashioned Begin/End async pattern that was perhaps more in vogue back in the .NET 1.0 to 2.0 days. Why not just return TPL Tasks?
However, there have been a few larger gotchas that I’ve discovered which can quite easily turn into non-trivial problems for a developer to safely work around. These are the sort of problems that can inherently change the way your application is designed.
Deferring messages via Defer()
I’m of the opinion that a Service Bus should take care of message redelivery mechanisms itself. On the most part, Azure Service Bus does this really well. But it supports this slightly bizarre type of return notification called deferral (invoked via a Defer() or BeginDefer() method). This basically sets a flag on the message internally so that it will never be implicitly redelivered by the queue to your application. But the message will fundamentally still exist inside the queue and you can even still Receive() it by asking for it by its SequenceId explicitly. That’s all good and everything but it leaves your application with a bigger problem. Where does it durably store those SequenceId‘s so that it knows what messages it has deferred? Sure you could hold them in-memory, that would be the naive approach and seems to be the approach taken by the majority of Azure books and documentation. But that is, frankly, a ridiculous idea and its insulting that authors in the fault-tolerant distributed systems space can even suggest such rubbish. The second problem is of course what sort of retry strategy do you adopt for that queue of deferred SequenceId‘s. Then you have to think about the transaction costs (i.e. money!) involved of whatever retry strategy you employ. What if your system has deferred hundreds of thousands of millions of messages? Consider that those deferred messages were outbound e-mails and they were being deferred because your mail server is down for 2 hours. If you were to retry those messages every 5 seconds, that is a lot of Service Bus transactions that you’ll get billed for.
One wonders why the Defer() method doesn’t support some sort of time duration or absolute point in time as a parameter that could indicate to the Service Bus when you actually want that message to be redelivered. It would certainly be a great help and I can’t imagine it would require that much work in the back-end for the Azure guys.
So how do you actually solve this problem?
For now, I have completely avoided the use of Defer() in my system. When I need to defer a message I will simply not perform any return notification for the message and I will allow the PeekLock to expire by its own accord (which the Service Bus handles itself). This approach has the following application design side affects:
- The deferral and retry logic is performed by the Service Bus entirely. My application does not need to worry about such things and the complexities involved.
- The deferral retry time is constant and is defined at queue description level. It cannot be controlled dynamically on a per message basis.
- Your queue’s
MaxDeliveryCount,LockDurationandDefaultTimeToLiveparameters will become inherently coupled and will need to be explicitly controlled.
(MaxDeliveryCount x LockDuration)will determine how long a message can be retried for and at what interval. If yourLockDurationis 1.5 minutes and you want to retry the message for 1 day thenMaxDeliveryCount = (1 day / 1.5 minutes) = 960.
This is a good stop-gap measure whilst I am iterating quickly. For small systems it can perhaps even be a permanent solution. But sooner or later it will cause problems for me and will need to be refactored.
I think the key to solving this problem is gaining better understanding over the reason why the message is being deferred in the first place, therefore providing you with more control. In my particular application it can only be caused when for instance an e-mail server is down or unreachable etc. So maybe I need some sort of watchdog in my application that (eventually) detects when the e-mail server is down and then actively stops trying to send messages, and indeed maybe even puts the brakes on actually Receive()‘ing messages from the queue in the first place. For those messages that have been received already then maybe there should be a separate queue called something like “email-outbox-deferred” (note the suffix). Messages queued on this would not actually be the real message but simply a pointer record that points back to the SequenceId of the real one on the “email-outbox” queue. When the watchdog detects that the e-mail server has come back up then it can start opening up the taps again. Firstly it would perform a Receive() loop on the “email-outbox-deferred” queue and attempt to reprocess those messages by following the SequenceId pointers back to the real queue. If it manages to successfully send the e-mail then it can issue a Complete() on both the deferred pointer message and the real message; to entirely remove it from the job queue. Otherwise it can Abandon() them both and the watchdog can start from square one by waiting to gain confidence of the e-mail servers health before retrying again.
The key to this approach is the watchdog. The watchdog must act as a type of valve that can open and close the Receive() loops on the two queues. Without this component you are liable to create long unconstrained loops or even infinite-like loops that will cause you to have potentially massive Service Bus transaction costs on your next bill from Azure.
I believe what I have described here is considered to be a SEDA or “Staged event-driven architecture“. Documentation of this pattern is a bit thin on the ground at the moment. Hopefully this will start to change as enterprise-level PaaS cloud applications gain more and more traction. But if anyone has any good book recommendations… ping me a message.
I’d be interested in learning more about message deferral and retry strategies, so please comment!
Transient fault retry logic is not built into the Client API
Transient faults are those that imply there is probably nothing inherently wrong with your message. It’s just that the Service Bus is perhaps too busy or network conditions dictate that it can’t be handled at this time. Fortunately the Client API includes a nice IsTransient boolean property on every MessagingException. Making good use of this property is harder than it first appears though.
All the Azure documentation that I’ve found makes use of (the rather hideous) Enterprise Library Transient Fault Block pattern. That’s all fine and good. But who honestly wants to be wrapping up every Client API action they do in that? Sure you can abstract it away again by yourself but where does it end?
It seems odd that the Client API doesn’t have this built in. Why when you invoke some operation like Receive() can’t you specify a transient fault retry strategy as an optional parameter? Or hell, why can’t you just specify this retry strategy at a QueueClient level?
I remain hopeful that this is something the Azure guys will fix soon.
Dead lettering is not the end
You may think that once you’ve dead lettered a message that you’ll not need to worry about it again from your application. Wrong.
When you dead letter a message it is actually just moved to a special sub-queue of your queue. If left untouched, it will remain in that sub-queue forever. Forever. Yes, forever. Yes, a memory leak. Eventually this will bring down your application because your queue will run into its memory limit (which can only be a maximum of 5GB). Annoyingly most developers are simply not aware of the dead letter sub-queues existence because it does not show up as a queue on the Server Explorer pane in Visual Studio. Bit of an oversight that one!
Having a human flush this queue out every now and then is not an acceptable solution for most systems. What if your system has a sudden spike in dead letters. Maybe a rogue system was submitting messages to your queues using an old serialization format or something? What if there were millions of these messages? Your application is going to be offline quicker than any human can react. So you need to build this logic into your application itself. This can be done by a watchdog process that keeps track of how many messages are being put onto the dead letter queue and actively ensures it is frequently pruned. This is very much a non-trivial problem.
Alternatively you can avoid the use of dead lettering entirely. This seems drastic but it may not be such a bad idea actually. You should consider if you actually care enough about retaining that carbon-copy of a message to keep it around as-is. Ask yourself whether just some simple and traditional trace/log output of the problem and approximate message content would be sufficient? Dead lettering is inherently a human concept that is analogous to “lost and found” or a “spam folder”. So perhaps with fully automated systems that desire as little human influence or administrative effort as possible then avoiding dead lettering entirely is the best choice.
Adding the Git changeset hash on every build
Version numbers in the traditional sense were obsoleted ever since DVCS arrived on the scene. They’re still useful for literal human consumption but I prefer having the full Git (or Hg) changeset hash available on the assembly as well so I can literally do CTRL+SHIFT+G on GitExtensions (or the equiv. on TortoiseHg etc) and paste in the changeset hash to go directly to it.
To set this up is fairly simple. Firstly, ensure your Git.exe is in your PATH. It generally will be if you installed Git with default/recommended settings. Second, import the NuGet package for MSBuild Community Tasks. Make sure you import the older version available here as there is a bug in the newer versions where the GitVersion task is broken.
<?xml version="1.0" encoding="utf-8"?>
<Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
<Import Project="$(SolutionDir)\.build\MSBuild.Community.Tasks.targets" />
<Target Name="UpdateVersionAssemblyInfo" BeforeTargets="BeforeBuild">
<PropertyGroup>
<Major>1</Major>
<Minor>0</Minor>
<Build>0</Build>
<Revision>0</Revision>
<GitHash>unknown</GitHash>
</PropertyGroup>
<GitVersion LocalPath="$(SolutionDir)" Short="false">
<Output TaskParameter="CommitHash" PropertyName="GitHash" />
</GitVersion>
<AssemblyInfo
CodeLanguage="CS"
OutputFile="$(SolutionDir)\VersionAssemblyInfo.cs"
AssemblyInformationalVersion="$(Major).$(Minor).$(Build).$(Revision) (git $(GitHash))"
AssemblyVersion="$(Major).$(Minor).$(Build).$(Revision)"
AssemblyFileVersion="$(Major).$(Minor).$(Build).$(Revision)" />
</Target>
</Project>
Create a VersionAssemblyInfo.cs in your solution root, and create a “Link” to this file in all your projects.
Then in each project file you just need to add a new import, typically underneath the “Microsoft.CSharp.targets” import.
<Import Project="$(SolutionDir)\VersionAssemblyInfo.targets" />
Now rebuild your project(s). If you check the disassembly with a tool like JustDecompile or Reflector and look at the assembly attributes you should see that (little known and rarely used) AssemblyInformationalVersionAttribute is there and contains both the human-readable major.minor.build.revision as well as the Git changeset hash.
You may now choose to replace the various areas of your application that display version information with this attribute instead.
Campfire requires rework
37signals’ Campfire product is a brilliant example of poor web application design. I would much rather use IRC and forfeit the tiny number of features in Campfire that are actually useful such as image in-lining and arguably Youtube clip in-lining too (though this one is rarely used for actual work). I can’t actually remember the last time we actually used the log history/transcripts feature.
Some of the problems and annoyances we have with Campfire are:
- Switching between rooms requires a full page refresh and is very slow.
- Despite the illusion of fake tabs, you can only be in one room at a time. Unlike an IRC client these fake tabs provide no indication that room activity has occurred. So you must periodically switch between rooms to check for activity. If you need to be in multiple rooms for real then you’ll have to open more tabs in your actual web browser.
- Timestamps for chat messages seem to be grouped together (good in theory) but they only appear after some random amount of time (bad implementation in practice).
- There is no built-in notification/pop-up support. So you have to install hacks like the Kindling for Chrome extension.
- A third of the screen real estate is occupied by pointless stuff that you rarely use. Yes having a user list is important, but having it visible at all times is not.
- All of the “Campfire clients” for Windows are rubbish.
- But the biggest annoyance of all is when you blindly start typing a message and the web application is too dumb to auto-focus the cursor on the message entry text box. And you can’t simply and predictably Tab-key your way to it easily. The only constant-time way to reach it is to focus it with the mouse. Bear in mind that this is a feature that every IM client since about 1999 has had baked in.
Is it time for 37signals to drink some of their own Kool-Aid and actually “rework” their Campfire service?
Targeting Mono in Visual Studio 2012
These steps are known good on my Windows 8 machine, with Visual Studio 2012 w/ Update 1 and Mono 2.10.9.
- Install Mono for Windows, from
http://www.go-mono.com/mono-downloads/download.html
Choose a decent path, which for me wasC:\Program Files (x86)\Mono-2.10.9 - Load an elevated administrative Command Prompt (Top tip: On Windows 8, hit WinKey+X then choose “Command Prompt (Admin)“)
- From this Command Prompt, execute the following commands (in order):
$ cd "C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.0\Profile"
$ mklink /d Mono "C:\Program Files (x86)\Mono-2.10.9\lib\mono\4.0"
$ cd Mono
$ mkdir RedistList
$ cd RedistList
$ notepad FrameworkList.xml - Notepad will start and ask about creating a new file, choose Yes.
- Now paste in this text and Save the file:
<?xml version="1.0" encoding="UTF-8"?>
<FileList ToolsVersion="4.0" RuntimeVersion="4.0" Name="Mono 2.10.9 Profile" Redist="Mono_2.10.9">
</FileList> - From the same Command Prompt, type:
$ regedit - In the Registry Editor, navigate to:
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319\SKUs\and create a new Key folder called.NETFramework,Version=v4.0,Profile=Mono - Now navigate to
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\.NETFramework\v4.0.30319\SKUs\and create the same Key folder again here (this step is only necessary for x64 machines, but since all software developers use those then you’ll probably need to do it!) - Now load up VS2012 and a project. Goto the Project Properties (Alt+Enter whilst having it selected on Solution Explorer) .
- Choose the Application tab on the left and look for the “Target framework” drop-down list.
- On this list you should see an entry called “Mono 2.10.9 Profile”.
- Select it, and Visual Studio should then convert your project to target the Mono framework. You’ll notice that it will re-reference all your various System assemblies and if you study the filenames they will point to the Mono ones that were created during Step #3.
Note: I was scratching my head at first as I kept getting an error from Visual Studio saying:
This application requires one of the following versions of the .NET Framework:
.NETFramework,Version=v4.0,Profile=MonoDo you want to install this .NET Framework version now?
Turns out that even on a x64 machine you MUST add both Registry key SKUs (see Steps #7 and #8). It is not enough to just add the Wow6432Node key, you must add the other as well. I can only assume this is because VS2012 is a 32-bit application. But maybe it’s also affected by whether you’re compiling to Any CPU, x64 or x86… who knows. It doesn’t really matter as long as this fixes it, which it does!
Building and executing your first program
Now that your development environment is nicely setup, you can proceed and build your first program.
The Mono equivalent of MSBuild is called XBuild (not sure why they didn’t call it MBuild or something!). You can build your .csproj by doing the following:
- Load the Mono Command Prompt (it will be on your Start Menu/Screen, just search for “mono”).
- Change directory to your project folder.
- Execute the following command to build your project using the Mono compiler:
$ xbuild /p:TargetFrameworkProfile=""
Note: You must specify the blankTargetFrameworkProfileparameter as otherwise the compiler will issue warnings along the lines of:
Unable to find framework corresponding to the target framework moniker ‘.NETFramework,Version=v4.0,Profile=Mono’. Framework assembly references will be resolved from the GAC, which might not be the intended behavior.
- Hopefully you’ll not have any errors from the build…
- Now you can run your program using the Mono runtime, to do this:
$ mono --gc=sgen "bin\Debug\helloworld.exe"
Note: You'll definitely want to use the "Sgen" garbage collector (hence the parameter) as the default one in Mono is unbearably slow. - You can do a quick “smoke test” to verify everything is in order with both your compilation and execution. Have your program execute something like:
Console.Write(typeof (Console).Assembly.CodeBase);
… and this should print out a path similar to:
file:///C:/PROGRA~2/MONO-2~1.9/lib/mono/4.0/mscorlib.dll
I’ve no idea why it prints it out using 8.3 filename format, but there you go! You’ll notice that if you run your program outside of the Mono runtime then it will pick up the Microsoft CLR version from the GAC.
Simulating the P of CAP on a Riak cluster
When developing and testing a distributed system, one of the essential concerns you will deal with is eventual consistency (EC). There are plenty of articles covering EC so I’m not going to dwell on that much here. What I am going to talk about is testing, particularly of the automated kind.
Testing an eventually consistent system is difficult because everything is transient and unpredictable. Unless you tune your consistency values like N and DW then you’re offered no guarantees about the extent of propagation of your commit around the cluster. And whilst consistency tuning may be acceptable for some tests, it most definitely won’t be acceptable for tests that cover concurrent write concerns such as your sibling resolution protocols.
What is “partition tolerance”?
This is where a single node or a group of nodes in the cluster become segregated from the rest of the cluster. I liken it to a “net split” on an IRC network. The system continues operating but it has been split into two or more parts. When a node has become segregated from the rest of the cluster it does not necessarily mean that its clients can also not reach it. Therefore all nodes can continue to perform writes on the dataset.
Generally speaking, a partition event is a transient situation. They may last a few seconds, a few minutes, hours.., days.. or even weeks. But the expectation is that eventually the partition will be repaired and the cluster returned to full health.
In the diagram (Fig. 1) there is a cluster comprised of three nodes, and this is the sequence of events:
- N2 becomes network partitioned from N1 and N3.
N1 and N3 can continue replicating between one another. - Client(s) connected to either N1 or N3 perform two writes (indicated by the green and orange).
- Client(s) connected to N2 perform three writes (purple, red, blue).
- When the network partition is resolved, the cluster begins to heal by merging the commits between the nodes.
- The yellow and green commits (that were already replicated between N1 and N3) are propagated onto N2.
- The purple, red and blue commits on N2 are propagated onto N1 and N3.
- The cluster is now fully healed.
Simulating a partition
I have a Riak cluster of three nodes running on Windows Azure and I needed some way to deterministically simulate a network partition scenario. The solution that I came up with was quite simple. I basically wrote some iptables scripts that temporarily firewalled certain IP traffic on the LAN in order to prevent the selected node from communicating with any other nodes.
To erect the partition:
# Simulates a network partition scenario by blocking all TCP LAN traffic for a node. # This will prevent the node from talking to other nodes in the cluster. # The rules are not persisted, so a restart of the iptables service (or indeed the whole box) will reset things to normal. # First add special exemption to allow loopback traffic on the LAN interface. # Without this, riak-admin gets confused and thinks the local node is down when it isn't. sudo iptables -I OUTPUT -p tcp -d $(hostname -i) -j ACCEPT sudo iptables -I INPUT -p tcp -s $(hostname -i) -j ACCEPT # Now block all other LAN traffic. sudo iptables -I OUTPUT 2 -p tcp -d 10.0.0.0/8 -j REJECT sudo iptables -I INPUT 2 -p tcp -s 10.0.0.0/8 -j REJECT
To tear down the partition:
# Restarts the iptables service, thereby resetting any temporary rules that were applied to it. sudo service iptables restart
Disclaimer: These are just rough shell scripts designed for use on a test bed development environment.
I then came up with a fairly neat little class that wraps up the concern of a network partition:
internal class NetworkPartition : IDisposable {
private readonly string _nodeName;
private bool _closed;
private NetworkPartition(string nodeName) {
_nodeName = nodeName;
}
/// <summary>
/// Creates a temporary network partition by segregating (at the IP firewall level) a particular node from all other nodes in the cluster.
/// </summary>
/// <param name="nodeName">The name of the node to be network partitioned. The name must exist as a PuTTY "Saved Session".</param>
/// <returns>An object that can be disposed when the partition is to be removed.</returns>
public static IDisposable Create(string nodeName) {
var np = new NetworkPartition(nodeName);
Plink.Execute("simulate-network-partition.sh", nodeName);
return np;
}
private void Close() {
if (_closed)
return;
Plink.Execute("restart-iptables.sh", _nodeName);
_closed = true;
}
public void Dispose() {
Close();
}
}
Which allows me to write test cases in the following way:
RingStatus.AssertOkay("riak03");
using (NetworkPartition.Create("riak03")) {
RingStatus.AssertDegraded("riak03");
// TODO: Do other stuff here whilst riak03 is network partitioned from the rest of the cluster.
}
RingReady.Wait("riak03");
RingStatus.AssertOkay("riak03");
Yes, RingStatus and RingReady aren’t documented here. But they’re pretty simple.
Obviously as part of this work I had to write a quick and dirty wrapper around the PuTTY plink.exe tool. This tool is basically a CLI version of PuTTY and it is very good for scripting automated tasks.
My solution could be improved to support creating partitions that consist of more than just one node, but for me the added value of this would be very small at this stage. Maybe I will add support for this later when I get into stress testing territory!
Source
You can view it over on GitHub; the namespace is tentatively called “ClusterTools”. Bear in mind it’s currently held inside another project but if there’s enough interest I will make it standalone. There has been talk on the CorrugatedIron project (which is a Riak client library for .NET) about starting up a Contrib library, so maybe this could be one its first contributions.
Riak timing out during startup
Recently I’ve been working on some interesting projects involving the eventually consistent Riak key-value database. Today I encountered a puzzling issue with a fresh cluster I was deploying. I had seemingly done everything identically to in the past except something was causing it to fail to startup when in service mode.
[administrator@riak01 ~]$ sudo service riak start Starting Riak: Riak failed to start within 15 seconds, see the output of 'riak console' for more information. If you want to wait longer, set the environment variable WAIT_FOR_ERLANG to the number of seconds to wait. [FAILED]
Bizarrely, in console mode it would start up fine. This led to me to believe it was some sort of user or permissions issue but I wasn’t totally sure. Perhaps I had accidentally executed Riak as another user and some sort of locking file was created? Or was it perhaps a performance issue with the “Shared Core” Azure VM’s I was using.
First I followed Riak’s advice by increasing the WAIT_FOR_ERLANG environment variable, I tried first 30 and then 60 seconds. But this made no difference at all. I’m not even sure if Riak was even using my new value as it still kept on printing out “15 seconds” as its reason for failing to start.
I researched some more and many places on the interweb were suggesting to purge the /var/lib/riak/ring/ directory (don’t do this if you have valuable data stored on the Riak instance). I tried this, but it also had no effect.
But it turned out that the solution was incredibly simple. Riak had created some sort of temporary lock directory at /tmp/riak. All I had to do was delete this directory and, hey presto, Riak would now start perfectly fine as a service!
$ sudo rm -r /tmp/riak
There may be more posts on the subject of Riak soon.
PS: I am using Riak version 1.2.1, on CentOS 6.3.
Why doesn’t C# support lambdas for properties?
Any time I write a C# property these days I can’t help thinking to myself how they would be so much cleaner if they supported a (syntactically restricted) form of lambda expression on both the getter and setter.
Consider this:
public DateTime Timestamp {
get {
return Settings.Default.Timestamp;
}
set {
Settings.Default.Timestamp = value;
Settings.Default.Save();
}
}
Would you agree that it looks cleaner like the following?
public DateTime Timestamp {
get => Settings.Default.Timestamp;
set => {
Settings.Default.Timestamp = value;
Settings.Default.Save();
}
}
The biggest seller for it, for me at least, is that it removes a largely superfluous “return” statement on the getter.
C# properties are all about the syntactic sugar, so why not go the last mile?
WebSocket servers on Windows Server
This is a slight continuation of the previous WebSockets versus REST… fight! post.
Buoyed with enthusiasm of WebSockets, I set about implementing a simple test harness of a WebSockets server in C#.NET using System.Net.HttpListener. Unfortunately, things did not go well. It turns out that HttpListener (and indeed, the underlying HTTP Server API a.k.a. http.sys) cannot be used at all to develop a WebSockets server on current versions of Windows. The http.sys is simply too strict with its policing of what it believes to be correct HTTP protocol.
In an IETF discussion thread, a Microsoft fellow called Stefen Shackow was quoted as saying the following:
The current technical issue for our stack is that the low-level Windows HTTP driver that handles incoming HTTP request (http.sys) does not recognize the current websockets format as having a valid entity body. As you have noted, the lack of a content length header means that http.sys does not make the nonce bytes in the entity available to any upstack callers. That’s part of the work we will need to do to build websockets support into http.sys. Basically we need to tweak http.sys to recognize what is really a non-HTTP request, as an HTTP request.
Implementation-wise this boils down to how strictly a server-side HTTP listener interprets incoming requests as HTTP. For example a server stack that instead treats port 80 as a TCP/IP socket as opposed to an HTTP endpoint can readily do whatever it wants with the websockets initiation request.
For our server-side HTTP stack we do plan to make the necessary changes to support websockets since we want IIS and ASP.NET to handle websockets workloads in the future. We have folks keeping an eye on the websockets spec as it progresses and we do plan to make whatever changes are necessary in the future.
This is a damn shame. As it stands right now, Server 2008/R2 boxes cannot host WebSockets. At least, not whilst sharing ports 80 and 443 with IIS web server. Because, sure, you could always write your WebSocket server to bind to those ports with a raw TCP socket and rewrite a ton of boilerplate HTTP code that http.sys can already do, and then put up with the fact that you can’t share the port with IIS on the same box. This is something that most people, me included, do not want to do.
Obviously it isn’t really anyone’s fault because back in the development time frame of Windows 7 and Server 2008/R2 (between 2006 to 2009) they could not have foreseen the WebSockets standard and the impact it might have on the design of APIs for HTTP servers.
The good thing is that Windows 8 Developer Preview seems to have this covered. According to the MSDN documentation, the HTTP Server API’s HttpSendHttpResponse function supports a new special flag called HTTP_SEND_RESPONSE_FLAG_OPAQUE that seems to suggest it will put the HTTP session into a sort of “dumb mode” whereby you can pass-thru pretty much whatever you want and http.sys won’t interfere:
HTTP_SEND_RESPONSE_FLAG_OPAQUESpecifies that the request/response is not HTTP compliant and all subsequent bytes should be treated as entity-body. Applications specify this flag when it is accepting a Web Socket upgrade request and informing HTTP.sys to treat the connection data as opaque data.
This flag is only allowed when the
StatusCodemember ofpHttpResponseis101, switching protocols.HttpSendHttpResponsereturnsERROR_INVALID_PARAMETERfor all other HTTP response types if this flag is used.Windows Developer Preview and later: This flag is supported.
Aside from the new System.Net.WebSockets namespace in .NET 4.5, there are also clear indications of this behaviour being exposed in the HttpListener of .NET 4.5 through a new HttpListenerContext.AcceptWebSocketAsync() method. The preliminary documentation seems to suggest that this method will support 2008 R2 and Windows 7. But this is almost certainly a misprint because I have inspected these areas of the .NET 4.5 libraries using Reflector and it is very clear that this is not the case:
The HttpListenerContext.AcceptWebSocketAsync() method directly calls into System.Net.WebSockets.WebSocketHelper (static class) which has a corresponding AcceptWebSocketAsync() method of its own. This method will then call a sanity check method tellingly named EnsureHttpSysSupportsWebSockets() which evaluates an expression containing the words “ComNetOS.IsWin8orLater“. I need say no more.
It seems clear now that Microsoft has chosen not to back port this minor HTTP Server API improvement to Server 2008 R2 / Windows 7. So now we must all hope that Windows Server 8 runs a beta program in tandem with the Windows 8 client, and launches within a month of each other. Otherwise Metro app development possibilities are going to be severely limited whilst we all wait for the Windows Server product to host our WebSocket server applications! Even still, it’s a shame that Server 2008 R2 won’t ever be able to host WebSockets.
It will be interesting if Willie Tarreau (of HA-Proxy fame) will come up with some enhancements in his project that might benefit those determined enough to still want to host (albeit, raw TCP-based) WebSockets on Server 2008 R2.

4 comments