Nathan Evans' Nemesis of the Moment

Azure Table Storage versus SQLite (on Azure Blob Store)

Posted in .NET Framework, Distributed Systems, Uncategorized by Nathan B. Evans on March 31, 2013

I’ve been trying to decide what storage platform I should use for my application.

It will be storing what are essentially ever-growing (but potentially prune-able past a certain age, say 1 to 3 years) transaction logs. Each record consists of four timestamp values (each 64-bits wide), three 32-bit integer values, and three text fields (two of which are generally of constricted length, say a max. of 256 characters) but one of a typically longer length but hopefully not more than about 1KB at worst case.

Having tried out SQLite on my local machine (which has an SSD), I managed to insert 60,000 of these records in about 1 second flat. I was impressed but cautious, because SQLite isn’t really a cloud-ready data store and it would require quite a bit of work in wrapping up with concurrency handling to make it work for what I’d need it to do. But I could not ignore that it was fast.

When I first read up about Azure Table Storage, I was a bit underwhelmed. It just seemed incredibly bloated and inefficient. It uses XML as its serialization transport. It uses HTTP/S as its network transport (and there is no fast low-level interface available like there is for Azure Service Bus). If you’ve ever used ProtoBuf’s, getting to know Azure Table Storage is a depressing experience. You can see the wastage but there is nothing you can do. Sure you can override the serialization to remove its reliance on reflection and shorten up the property names, but that’s only half the story.

I persisted anyway, and dived into Table Storage to give it a proper go and see what it could do.

I ran into a couple problems, mostly with the .NET Client API. I was submitting a batch of approx. 600 entities. It was returning back to me with a rather vague and puzzling exception:

Microsoft.WindowsAzure.Storage.StorageException was caught
  HResult=-2146233088
  Message=Unexpected response code for operation : 99
  Source=Microsoft.WindowsAzure.Storage
  StackTrace:
       at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext)
       at Microsoft.WindowsAzure.Storage.Table.TableBatchOperation.Execute(CloudTableClient client, String tableName, TableRequestOptions requestOptions, OperationContext operationContext)
       at Microsoft.WindowsAzure.Storage.Table.CloudTable.ExecuteBatch(TableBatchOperation batch, TableRequestOptions requestOptions, OperationContext operationContext)
       at Tests.WazTableSandbox.Write()

Nothing of any worth showed up on Google about this. I dug into it a bit further and noticed the extended exception information mentioned something about “InvalidInput” and “99:One of the request inputs is not valid.” Not really that useful still. Even Googling these gave me no clues as to what was wrong.

I read somewhere that Azure Table Storage batches are limited to 100 entities per batch. So I wrote a quick LINQ GroupBy to batch up my dataset by partition key (yes, that’s another requirement; batches of operations must all be for the same partition key). Fortunately, the exception went away once I was grouping them into batches of 100 correctly. Surely the .NET Client API deserves a better and more self-explanatory exception message for this edge case though? It’s blatantly going to be the first problem any developer encounters when trying to use CloudTable.ExecuteBatch().

With that solved, I continued with my tests.

My test data was batched up, by partition key, into these batch sizes: 26, 28, 22, 46, 51, 61, 32, 14, 46, 34, 31, 42, 59 and 8.

I then wrote some test code for SQLite that mirrored what I was doing with the Table Storage. I made sure to use a SQLite transaction per batch, so that each batch would be written as an atomic unit. I purposefully gave SQLite an advantage by “preparing” the command (i.e. pre-compiling the byte code for the SQL command).

I deployed my test program onto an Azure VM (“extra small”, if it matters?) and ran it. Here’s what came out:

WazTable
Executing batch of 26
Executing batch of 28
Executing batch of 22
Executing batch of 46
Executing batch of 51
Executing batch of 61
Executing batch of 32
Executing batch of 14
Executing batch of 46
Executing batch of 34
Executing batch of 31
Executing batch of 42
Executing batch of 59
Executing batch of 8
00:00:01.8756798

Sqlite
Executing batch of 26
Executing batch of 28
Executing batch of 22
Executing batch of 46
Executing batch of 51
Executing batch of 61
Executing batch of 32
Executing batch of 14
Executing batch of 46
Executing batch of 34
Executing batch of 31
Executing batch of 42
Executing batch of 59
Executing batch of 8
00:00:03.4291801

So although SQLite was massively faster on my local SSD-powered workstation. It was substantially slower (almost 2x) when running from the Azure VM (and hence on a blob store). This was a bit disappointing but it gives me confidence that I am using the right data storage tool for the job.

You may be wondering why I even considered SQLite as an option in the first place. Well, good question. I am still on the fence as to whether my application will be “full cloud” or just a half-way house that can be installed somewhere without any cloudy stuff involved. That’s why I wanted to investigate SQLite as it’s a standalone database. I might support both, in which case I would use SQLite for non-cloud deployments and Azure Table Storage for cloud deployments. I still find it disappointing how inefficient the Azure Table Storage has been designed. They really need to introduce a lower-level network transport like the one for Service Bus. And a better, XML-less, serialization format.

Tagged with: , ,