Nathan Evans' Nemesis of the Moment

Azure Table Storage versus SQLite (on Azure Blob Store)

Posted in .NET Framework, Distributed Systems, Uncategorized by Nathan B. Evans on March 31, 2013

I’ve been trying to decide what storage platform I should use for my application.

It will be storing what are essentially ever-growing (but potentially prune-able past a certain age, say 1 to 3 years) transaction logs. Each record consists of four timestamp values (each 64-bits wide), three 32-bit integer values, and three text fields (two of which are generally of constricted length, say a max. of 256 characters) but one of a typically longer length but hopefully not more than about 1KB at worst case.

Having tried out SQLite on my local machine (which has an SSD), I managed to insert 60,000 of these records in about 1 second flat. I was impressed but cautious, because SQLite isn’t really a cloud-ready data store and it would require quite a bit of work in wrapping up with concurrency handling to make it work for what I’d need it to do. But I could not ignore that it was fast.

When I first read up about Azure Table Storage, I was a bit underwhelmed. It just seemed incredibly bloated and inefficient. It uses XML as its serialization transport. It uses HTTP/S as its network transport (and there is no fast low-level interface available like there is for Azure Service Bus). If you’ve ever used ProtoBuf’s, getting to know Azure Table Storage is a depressing experience. You can see the wastage but there is nothing you can do. Sure you can override the serialization to remove its reliance on reflection and shorten up the property names, but that’s only half the story.

I persisted anyway, and dived into Table Storage to give it a proper go and see what it could do.

I ran into a couple problems, mostly with the .NET Client API. I was submitting a batch of approx. 600 entities. It was returning back to me with a rather vague and puzzling exception:

Microsoft.WindowsAzure.Storage.StorageException was caught
  HResult=-2146233088
  Message=Unexpected response code for operation : 99
  Source=Microsoft.WindowsAzure.Storage
  StackTrace:
       at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext)
       at Microsoft.WindowsAzure.Storage.Table.TableBatchOperation.Execute(CloudTableClient client, String tableName, TableRequestOptions requestOptions, OperationContext operationContext)
       at Microsoft.WindowsAzure.Storage.Table.CloudTable.ExecuteBatch(TableBatchOperation batch, TableRequestOptions requestOptions, OperationContext operationContext)
       at Tests.WazTableSandbox.Write()

Nothing of any worth showed up on Google about this. I dug into it a bit further and noticed the extended exception information mentioned something about “InvalidInput” and “99:One of the request inputs is not valid.” Not really that useful still. Even Googling these gave me no clues as to what was wrong.

I read somewhere that Azure Table Storage batches are limited to 100 entities per batch. So I wrote a quick LINQ GroupBy to batch up my dataset by partition key (yes, that’s another requirement; batches of operations must all be for the same partition key). Fortunately, the exception went away once I was grouping them into batches of 100 correctly. Surely the .NET Client API deserves a better and more self-explanatory exception message for this edge case though? It’s blatantly going to be the first problem any developer encounters when trying to use CloudTable.ExecuteBatch().

With that solved, I continued with my tests.

My test data was batched up, by partition key, into these batch sizes: 26, 28, 22, 46, 51, 61, 32, 14, 46, 34, 31, 42, 59 and 8.

I then wrote some test code for SQLite that mirrored what I was doing with the Table Storage. I made sure to use a SQLite transaction per batch, so that each batch would be written as an atomic unit. I purposefully gave SQLite an advantage by “preparing” the command (i.e. pre-compiling the byte code for the SQL command).

I deployed my test program onto an Azure VM (“extra small”, if it matters?) and ran it. Here’s what came out:

WazTable
Executing batch of 26
Executing batch of 28
Executing batch of 22
Executing batch of 46
Executing batch of 51
Executing batch of 61
Executing batch of 32
Executing batch of 14
Executing batch of 46
Executing batch of 34
Executing batch of 31
Executing batch of 42
Executing batch of 59
Executing batch of 8
00:00:01.8756798

Sqlite
Executing batch of 26
Executing batch of 28
Executing batch of 22
Executing batch of 46
Executing batch of 51
Executing batch of 61
Executing batch of 32
Executing batch of 14
Executing batch of 46
Executing batch of 34
Executing batch of 31
Executing batch of 42
Executing batch of 59
Executing batch of 8
00:00:03.4291801

So although SQLite was massively faster on my local SSD-powered workstation. It was substantially slower (almost 2x) when running from the Azure VM (and hence on a blob store). This was a bit disappointing but it gives me confidence that I am using the right data storage tool for the job.

You may be wondering why I even considered SQLite as an option in the first place. Well, good question. I am still on the fence as to whether my application will be “full cloud” or just a half-way house that can be installed somewhere without any cloudy stuff involved. That’s why I wanted to investigate SQLite as it’s a standalone database. I might support both, in which case I would use SQLite for non-cloud deployments and Azure Table Storage for cloud deployments. I still find it disappointing how inefficient the Azure Table Storage has been designed. They really need to introduce a lower-level network transport like the one for Service Bus. And a better, XML-less, serialization format.

About these ads
Tagged with: , ,

5 Responses

Subscribe to comments with RSS.

  1. Simon Timms said, on April 21, 2013 at 2:09 AM

    Thanks for the benchmarks. It’s good to have some information about the speed of table storage.

  2. ItsMe said, on August 28, 2013 at 11:05 AM

    Did you do any analysis to see why SQLite did so poorly on the Azure VM? Given your discovery related to the technology utilized by table storage it seems counter intuitive that it would perform better/faster given the test you ran?

    • Nathan Evans said, on August 28, 2013 at 11:10 AM

      It’s not really possible to determine precisely why it performed in that way. For what it’s worth, I don’t think it “performed badly”, just that it was slower than Table Storage. It must be down to the way Azure VM’s are setup to run their system drive as virtualised on top of a blob storage account. Maybe disabling “Geo-replication” on the storage account would have helped substantially. Maybe I should have done another test against the VM’s “temporary storage” drive which sits locally on the VM’s host machine. I don’t know whether this drive is a SSD or HDD though; MS doesn’t document that detail as far as I know.

  3. ItsMe said, on August 28, 2013 at 3:36 PM

    Thanks for the reply. I would expect the SQLite using the VM to perform better as it is not using a WCF Data Service via XML serialization on both ends plus the need to write to storage which I’m assuming is also accomplished via virtual storage for Table Storage? Are their cost differences of SQLite compared with Table Storage?

    • Nathan Evans said, on August 28, 2013 at 3:44 PM

      Whilst it doesn’t use Data Services under the hood, it still required (in my test at least) to use the virtual drive (blob store) so there was still network or SAN I/O involved. Who knows what optimisations Azure is running under the hood for I/O of all forms.

      As for cost differences. SQLite almost certainly costs more. It is a B+tree data structure sitting on disk, so often just to make a write you must make a combination of first some reads, then some writes. Rarely is a single row update going to result in a single write transaction. But again, who knows what optimisations Azure does under the hood for cloud drives. Perhaps they batch them up and flush them after an interval.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: