Recently I’ve been working on some interesting projects involving the eventually consistent Riak key-value database. Today I encountered a puzzling issue with a fresh cluster I was deploying. I had seemingly done everything identically to in the past except something was causing it to fail to startup when in service mode.
[administrator@riak01 ~]$ sudo service riak start Starting Riak: Riak failed to start within 15 seconds, see the output of 'riak console' for more information. If you want to wait longer, set the environment variable WAIT_FOR_ERLANG to the number of seconds to wait. [FAILED]
Bizarrely, in console mode it would start up fine. This led to me to believe it was some sort of user or permissions issue but I wasn’t totally sure. Perhaps I had accidentally executed Riak as another user and some sort of locking file was created? Or was it perhaps a performance issue with the “Shared Core” Azure VM’s I was using.
First I followed Riak’s advice by increasing the
WAIT_FOR_ERLANG environment variable, I tried first 30 and then 60 seconds. But this made no difference at all. I’m not even sure if Riak was even using my new value as it still kept on printing out “15 seconds” as its reason for failing to start.
I researched some more and many places on the interweb were suggesting to purge the
/var/lib/riak/ring/ directory (don’t do this if you have valuable data stored on the Riak instance). I tried this, but it also had no effect.
But it turned out that the solution was incredibly simple. Riak had created some sort of temporary lock directory at
/tmp/riak. All I had to do was delete this directory and, hey presto, Riak would now start perfectly fine as a service!
$ sudo rm -r /tmp/riak
There may be more posts on the subject of Riak soon. 🙂
PS: I am using Riak version 1.2.1, on CentOS 6.3.