Azure Table Storage versus SQLite (on Azure Blob Store)
I’ve been trying to decide what storage platform I should use for my application.
It will be storing what are essentially ever-growing (but potentially prune-able past a certain age, say 1 to 3 years) transaction logs. Each record consists of four timestamp values (each 64-bits wide), three 32-bit integer values, and three text fields (two of which are generally of constricted length, say a max. of 256 characters) but one of a typically longer length but hopefully not more than about 1KB at worst case.
Having tried out SQLite on my local machine (which has an SSD), I managed to insert 60,000 of these records in about 1 second flat. I was impressed but cautious, because SQLite isn’t really a cloud-ready data store and it would require quite a bit of work in wrapping up with concurrency handling to make it work for what I’d need it to do. But I could not ignore that it was fast.
When I first read up about Azure Table Storage, I was a bit underwhelmed. It just seemed incredibly bloated and inefficient. It uses XML as its serialization transport. It uses HTTP/S as its network transport (and there is no fast low-level interface available like there is for Azure Service Bus). If you’ve ever used ProtoBuf’s, getting to know Azure Table Storage is a depressing experience. You can see the wastage but there is nothing you can do. Sure you can override the serialization to remove its reliance on reflection and shorten up the property names, but that’s only half the story.
I persisted anyway, and dived into Table Storage to give it a proper go and see what it could do.
I ran into a couple problems, mostly with the .NET Client API. I was submitting a batch of approx. 600 entities. It was returning back to me with a rather vague and puzzling exception:
Microsoft.WindowsAzure.Storage.StorageException was caught HResult=-2146233088 Message=Unexpected response code for operation : 99 Source=Microsoft.WindowsAzure.Storage StackTrace: at Microsoft.WindowsAzure.Storage.Core.Executor.Executor.ExecuteSync[T](StorageCommandBase`1 cmd, IRetryPolicy policy, OperationContext operationContext) at Microsoft.WindowsAzure.Storage.Table.TableBatchOperation.Execute(CloudTableClient client, String tableName, TableRequestOptions requestOptions, OperationContext operationContext) at Microsoft.WindowsAzure.Storage.Table.CloudTable.ExecuteBatch(TableBatchOperation batch, TableRequestOptions requestOptions, OperationContext operationContext) at Tests.WazTableSandbox.Write()
Nothing of any worth showed up on Google about this. I dug into it a bit further and noticed the extended exception information mentioned something about “InvalidInput
” and “99:One of the request inputs is not valid.
” Not really that useful still. Even Googling these gave me no clues as to what was wrong.
I read somewhere that Azure Table Storage batches are limited to 100 entities per batch. So I wrote a quick LINQ GroupBy to batch up my dataset by partition key (yes, that’s another requirement; batches of operations must all be for the same partition key). Fortunately, the exception went away once I was grouping them into batches of 100 correctly. Surely the .NET Client API deserves a better and more self-explanatory exception message for this edge case though? It’s blatantly going to be the first problem any developer encounters when trying to use CloudTable.ExecuteBatch()
.
With that solved, I continued with my tests.
My test data was batched up, by partition key, into these batch sizes: 26, 28, 22, 46, 51, 61, 32, 14, 46, 34, 31, 42, 59 and 8.
I then wrote some test code for SQLite that mirrored what I was doing with the Table Storage. I made sure to use a SQLite transaction per batch, so that each batch would be written as an atomic unit. I purposefully gave SQLite an advantage by “preparing” the command (i.e. pre-compiling the byte code for the SQL command).
I deployed my test program onto an Azure VM (“extra small”, if it matters?) and ran it. Here’s what came out:
WazTable Executing batch of 26 Executing batch of 28 Executing batch of 22 Executing batch of 46 Executing batch of 51 Executing batch of 61 Executing batch of 32 Executing batch of 14 Executing batch of 46 Executing batch of 34 Executing batch of 31 Executing batch of 42 Executing batch of 59 Executing batch of 8 00:00:01.8756798 Sqlite Executing batch of 26 Executing batch of 28 Executing batch of 22 Executing batch of 46 Executing batch of 51 Executing batch of 61 Executing batch of 32 Executing batch of 14 Executing batch of 46 Executing batch of 34 Executing batch of 31 Executing batch of 42 Executing batch of 59 Executing batch of 8 00:00:03.4291801
So although SQLite was massively faster on my local SSD-powered workstation. It was substantially slower (almost 2x) when running from the Azure VM (and hence on a blob store). This was a bit disappointing but it gives me confidence that I am using the right data storage tool for the job.
You may be wondering why I even considered SQLite as an option in the first place. Well, good question. I am still on the fence as to whether my application will be “full cloud” or just a half-way house that can be installed somewhere without any cloudy stuff involved. That’s why I wanted to investigate SQLite as it’s a standalone database. I might support both, in which case I would use SQLite for non-cloud deployments and Azure Table Storage for cloud deployments. I still find it disappointing how inefficient the Azure Table Storage has been designed. They really need to introduce a lower-level network transport like the one for Service Bus. And a better, XML-less, serialization format.
Thanks for the benchmarks. It’s good to have some information about the speed of table storage.
Did you do any analysis to see why SQLite did so poorly on the Azure VM? Given your discovery related to the technology utilized by table storage it seems counter intuitive that it would perform better/faster given the test you ran?
It’s not really possible to determine precisely why it performed in that way. For what it’s worth, I don’t think it “performed badly”, just that it was slower than Table Storage. It must be down to the way Azure VM’s are setup to run their system drive as virtualised on top of a blob storage account. Maybe disabling “Geo-replication” on the storage account would have helped substantially. Maybe I should have done another test against the VM’s “temporary storage” drive which sits locally on the VM’s host machine. I don’t know whether this drive is a SSD or HDD though; MS doesn’t document that detail as far as I know.
Thanks for the reply. I would expect the SQLite using the VM to perform better as it is not using a WCF Data Service via XML serialization on both ends plus the need to write to storage which I’m assuming is also accomplished via virtual storage for Table Storage? Are their cost differences of SQLite compared with Table Storage?
Whilst it doesn’t use Data Services under the hood, it still required (in my test at least) to use the virtual drive (blob store) so there was still network or SAN I/O involved. Who knows what optimisations Azure is running under the hood for I/O of all forms.
As for cost differences. SQLite almost certainly costs more. It is a B+tree data structure sitting on disk, so often just to make a write you must make a combination of first some reads, then some writes. Rarely is a single row update going to result in a single write transaction. But again, who knows what optimisations Azure does under the hood for cloud drives. Perhaps they batch them up and flush them after an interval.