Nathan Evans' Nemesis of the Moment

Riak timing out during startup

Posted in Databases, Unix Environment by Nathan B. Evans on February 6, 2013

Recently I’ve been working on some interesting projects involving the eventually consistent Riak key-value database. Today I encountered a puzzling issue with a fresh cluster I was deploying. I had seemingly done everything identically to in the past except something was causing it to fail to startup when in service mode.

[administrator@riak01 ~]$ sudo service riak start
Starting Riak: Riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
[FAILED]

Bizarrely, in console mode it would start up fine. This led to me to believe it was some sort of user or permissions issue but I wasn’t totally sure. Perhaps I had accidentally executed Riak as another user and some sort of locking file was created? Or was it perhaps a performance issue with the “Shared Core” Azure VM’s I was using.

First I followed Riak’s advice by increasing the WAIT_FOR_ERLANG environment variable, I tried first 30 and then 60 seconds. But this made no difference at all. I’m not even sure if Riak was even using my new value as it still kept on printing out “15 seconds” as its reason for failing to start.

I researched some more and many places on the interweb were suggesting to purge the /var/lib/riak/ring/ directory (don’t do this if you have valuable data stored on the Riak instance). I tried this, but it also had no effect.

But it turned out that the solution was incredibly simple. Riak had created some sort of temporary lock directory at /tmp/riak. All I had to do was delete this directory and, hey presto, Riak would now start perfectly fine as a service!

$ sudo rm -r /tmp/riak

There may be more posts on the subject of Riak soon.🙂

PS: I am using Riak version 1.2.1, on CentOS 6.3.

Tagged with: ,

3 Responses

Subscribe to comments with RSS.

  1. Adron said, on February 12, 2013 at 5:32 PM

    Hey Nathan! Cool to see you’re hacking through the Azure VMs with Riak. I’m kicking off some comparisons myself and will be blogging it here (http://compositecode.com) and prospectively getting some of the material up on the Basho Docs & Blog (I’m a coder w/ the company). If you have any questions feel free to reach out to me (on twitter at @adron) and related – would love to trade notes sometime on what we’re each respectively working on.

    Cheers!

  2. Jared (@_jared) said, on February 14, 2013 at 3:01 PM

    Nathan,

    Since this has happened to a few people before, a warning has been added to the 1.3 release for this very issue. Now startup should complain loudly if the /tmp/riak directory is not writeable.

    https://github.com/basho/riak/pull/194

  3. […] it should be OK, but sometimes removing this will delete any locks causing RIAK not to start. (Thanks to Nathan Evans) To fix this just […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: