I’ve been trying to decide what storage platform I should use for my application.
It will be storing what are essentially ever-growing (but potentially prune-able past a certain age, say 1 to 3 years) transaction logs. Each record consists of four timestamp values (each 64-bits wide), three 32-bit integer values, and three text fields (two of which are generally of constricted length, say a max. of 256 characters) but one of a typically longer length but hopefully not more than about 1KB at worst case.
Having tried out SQLite on my local machine (which has an SSD), I managed to insert 60,000 of these records in about 1 second flat. I was impressed but cautious, because SQLite isn’t really a cloud-ready data store and it would require quite a bit of work in wrapping up with concurrency handling to make it work for what I’d need it to do. But I could not ignore that it was fast.
When I first read up about Azure Table Storage, I was a bit underwhelmed. It just seemed incredibly bloated and inefficient. It uses XML as its serialization transport. It uses HTTP/S as its network transport (and there is no fast low-level interface available like there is for Azure Service Bus). If you’ve ever used ProtoBuf’s, getting to know Azure Table Storage is a depressing experience. You can see the wastage but there is nothing you can do. Sure you can override the serialization to remove its reliance on reflection and shorten up the property names, but that’s only half the story.
I persisted anyway, and dived into Table Storage to give it a proper go and see what it could do.
I ran into a couple problems, mostly with the .NET Client API. I was submitting a batch of approx. 600 entities. It was returning back to me with a rather vague and puzzling exception:
Microsoft.WindowsAzure.Storage.StorageException was caught HResult=-2146233088 Message=Unexpected response code for operation : 99 Source=Microsoft.WindowsAzure.Storage StackTrace: at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext) at Microsoft.WindowsAzure.Storage.Table.TableBatchOperation.Execute(CloudTableClient client, String tableName, TableRequestOptions requestOptions, OperationContext operationContext) at Microsoft.WindowsAzure.Storage.Table.CloudTable.ExecuteBatch(TableBatchOperation batch, TableRequestOptions requestOptions, OperationContext operationContext) at Tests.WazTableSandbox.Write()
Nothing of any worth showed up on Google about this. I dug into it a bit further and noticed the extended exception information mentioned something about “
InvalidInput” and “
99:One of the request inputs is not valid.” Not really that useful still. Even Googling these gave me no clues as to what was wrong.
I read somewhere that Azure Table Storage batches are limited to 100 entities per batch. So I wrote a quick LINQ GroupBy to batch up my dataset by partition key (yes, that’s another requirement; batches of operations must all be for the same partition key). Fortunately, the exception went away once I was grouping them into batches of 100 correctly. Surely the .NET Client API deserves a better and more self-explanatory exception message for this edge case though? It’s blatantly going to be the first problem any developer encounters when trying to use
With that solved, I continued with my tests.
My test data was batched up, by partition key, into these batch sizes: 26, 28, 22, 46, 51, 61, 32, 14, 46, 34, 31, 42, 59 and 8.
I then wrote some test code for SQLite that mirrored what I was doing with the Table Storage. I made sure to use a SQLite transaction per batch, so that each batch would be written as an atomic unit. I purposefully gave SQLite an advantage by “preparing” the command (i.e. pre-compiling the byte code for the SQL command).
I deployed my test program onto an Azure VM (“extra small”, if it matters?) and ran it. Here’s what came out:
WazTable Executing batch of 26 Executing batch of 28 Executing batch of 22 Executing batch of 46 Executing batch of 51 Executing batch of 61 Executing batch of 32 Executing batch of 14 Executing batch of 46 Executing batch of 34 Executing batch of 31 Executing batch of 42 Executing batch of 59 Executing batch of 8 00:00:01.8756798 Sqlite Executing batch of 26 Executing batch of 28 Executing batch of 22 Executing batch of 46 Executing batch of 51 Executing batch of 61 Executing batch of 32 Executing batch of 14 Executing batch of 46 Executing batch of 34 Executing batch of 31 Executing batch of 42 Executing batch of 59 Executing batch of 8 00:00:03.4291801
So although SQLite was massively faster on my local SSD-powered workstation. It was substantially slower (almost 2x) when running from the Azure VM (and hence on a blob store). This was a bit disappointing but it gives me confidence that I am using the right data storage tool for the job.
You may be wondering why I even considered SQLite as an option in the first place. Well, good question. I am still on the fence as to whether my application will be “full cloud” or just a half-way house that can be installed somewhere without any cloudy stuff involved. That’s why I wanted to investigate SQLite as it’s a standalone database. I might support both, in which case I would use SQLite for non-cloud deployments and Azure Table Storage for cloud deployments. I still find it disappointing how inefficient the Azure Table Storage has been designed. They really need to introduce a lower-level network transport like the one for Service Bus. And a better, XML-less, serialization format.
Version numbers in the traditional sense were obsoleted ever since DVCS arrived on the scene. They’re still useful for literal human consumption but I prefer having the full Git (or Hg) changeset hash available on the assembly as well so I can literally do CTRL+SHIFT+G on GitExtensions (or the equiv. on TortoiseHg etc) and paste in the changeset hash to go directly to it.
To set this up is fairly simple. Firstly, ensure your
Git.exe is in your PATH. It generally will be if you installed Git with default/recommended settings. Second, import the NuGet package for MSBuild Community Tasks. Make sure you import the older version available here as there is a bug in the newer versions where the GitVersion task is broken.
<?xml version="1.0" encoding="utf-8"?> <Project ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003"> <Import Project="$(SolutionDir)\.build\MSBuild.Community.Tasks.targets" /> <Target Name="UpdateVersionAssemblyInfo" BeforeTargets="BeforeBuild"> <PropertyGroup> <Major>1</Major> <Minor>0</Minor> <Build>0</Build> <Revision>0</Revision> <GitHash>unknown</GitHash> </PropertyGroup> <GitVersion LocalPath="$(SolutionDir)" Short="false"> <Output TaskParameter="CommitHash" PropertyName="GitHash" /> </GitVersion> <AssemblyInfo CodeLanguage="CS" OutputFile="$(SolutionDir)\VersionAssemblyInfo.cs" AssemblyInformationalVersion="$(Major).$(Minor).$(Build).$(Revision) (git $(GitHash))" AssemblyVersion="$(Major).$(Minor).$(Build).$(Revision)" AssemblyFileVersion="$(Major).$(Minor).$(Build).$(Revision)" /> </Target> </Project>
VersionAssemblyInfo.cs in your solution root, and create a “Link” to this file in all your projects.
Then in each project file you just need to add a new import, typically underneath the “
<Import Project="$(SolutionDir)\VersionAssemblyInfo.targets" />
Now rebuild your project(s). If you check the disassembly with a tool like JustDecompile or Reflector and look at the assembly attributes you should see that (little known and rarely used)
AssemblyInformationalVersionAttribute is there and contains both the human-readable major.minor.build.revision as well as the Git changeset hash.
You may now choose to replace the various areas of your application that display version information with this attribute instead.
37signals’ Campfire product is a brilliant example of poor web application design. I would much rather use IRC and forfeit the tiny number of features in Campfire that are actually useful such as image in-lining and arguably Youtube clip in-lining too (though this one is rarely used for actual work). I can’t actually remember the last time we actually used the log history/transcripts feature.
Some of the problems and annoyances we have with Campfire are:
- Switching between rooms requires a full page refresh and is very slow.
- Despite the illusion of fake tabs, you can only be in one room at a time. Unlike an IRC client these fake tabs provide no indication that room activity has occurred. So you must periodically switch between rooms to check for activity. If you need to be in multiple rooms for real then you’ll have to open more tabs in your actual web browser.
- Timestamps for chat messages seem to be grouped together (good in theory) but they only appear after some random amount of time (bad implementation in practice).
- There is no built-in notification/pop-up support. So you have to install hacks like the Kindling for Chrome extension.
- A third of the screen real estate is occupied by pointless stuff that you rarely use. Yes having a user list is important, but having it visible at all times is not.
- All of the “Campfire clients” for Windows are rubbish.
- But the biggest annoyance of all is when you blindly start typing a message and the web application is too dumb to auto-focus the cursor on the message entry text box. And you can’t simply and predictably Tab-key your way to it easily. The only constant-time way to reach it is to focus it with the mouse. Bear in mind that this is a feature that every IM client since about 1999 has had baked in.
Is it time for 37signals to drink some of their own Kool-Aid and actually “rework” their Campfire service?
Welp. I’ve finally scratched the itch to start blogging. So let’s see where this goes… 🙂