Nathan Evans' Nemesis of the Moment

Targeting Mono in Visual Studio 2012

Posted in .NET Framework, Unix Environment, Windows Environment by Nathan B. Evans on February 13, 2013

These steps are known good on my Windows 8 machine, with Visual Studio 2012 w/ Update 1 and Mono 2.10.9.

  1. Install Mono for Windows, from http://www.go-mono.com/mono-downloads/download.html
    Choose a decent path, which for me was C:\Program Files (x86)\Mono-2.10.9
  2. Load an elevated administrative Command Prompt (Top tip: On Windows 8, hit WinKey+X then choose “Command Prompt (Admin)“)
  3. From this Command Prompt, execute the following commands (in order):
    $ cd "C:\Program Files (x86)\Reference Assemblies\Microsoft\Framework\.NETFramework\v4.0\Profile"
    $ mklink /d Mono "C:\Program Files (x86)\Mono-2.10.9\lib\mono\4.0"
    $ cd Mono
    $ mkdir RedistList
    $ cd RedistList
    $ notepad FrameworkList.xml
  4. Notepad will start and ask about creating a new file, choose Yes.
  5. Now paste in this text and Save the file:
    <?xml version="1.0" encoding="UTF-8"?>
    <FileList ToolsVersion="4.0" RuntimeVersion="4.0" Name="Mono 2.10.9 Profile" Redist="Mono_2.10.9">
    </FileList>
  6. From the same Command Prompt, type:
    $ regedit
  7. In the Registry Editor, navigate to: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\v4.0.30319\SKUs\ and create a new Key folder called .NETFramework,Version=v4.0,Profile=Mono
  8. Now navigate to HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\.NETFramework\v4.0.30319\SKUs\ and create the same Key folder again here (this step is only necessary for x64 machines, but since all software developers use those then you’ll probably need to do it!)
  9. Now load up VS2012 and a project. Goto the Project Properties (Alt+Enter whilst having it selected on Solution Explorer) .
  10. Choose the Application tab on the left and look for the “Target framework” drop-down list.
  11. On this list you should see an entry called “Mono 2.10.9 Profile”.
  12. Select it, and Visual Studio should then convert your project to target the Mono framework. You’ll notice that it will re-reference all your various System assemblies and if you study the filenames they will point to the Mono ones that were created during Step #3.

Note: I was scratching my head at first as I kept getting an error from Visual Studio saying:

This application requires one of the following versions of the .NET Framework:
.NETFramework,Version=v4.0,Profile=Mono

Do you want to install this .NET Framework version now?

Turns out that even on a x64 machine you MUST add both Registry key SKUs (see Steps #7 and #8). It is not enough to just add the Wow6432Node key, you must add the other as well. I can only assume this is because VS2012 is a 32-bit application. But maybe it’s also affected by whether you’re compiling to Any CPU, x64 or x86… who knows. It doesn’t really matter as long as this fixes it, which it does!

Building and executing your first program

Now that your development environment is nicely setup, you can proceed and build your first program.

The Mono equivalent of MSBuild is called XBuild (not sure why they didn’t call it MBuild or something!). You can build your .csproj by doing the following:

  1. Load the Mono Command Prompt (it will be on your Start Menu/Screen, just search for “mono”).
  2. Change directory to your project folder.
  3. Execute the following command to build your project using the Mono compiler:
    $ xbuild /p:TargetFrameworkProfile=""
    Note: You must specify the blank TargetFrameworkProfile parameter as otherwise the compiler will issue warnings along the lines of:

    Unable to find framework corresponding to the target framework moniker ‘.NETFramework,Version=v4.0,Profile=Mono’. Framework assembly references will be resolved from the GAC, which might not be the intended behavior.

  4. Hopefully you’ll not have any errors from the  build…
  5. Now you can run your program using the Mono runtime, to do this:
    $ mono --gc=sgen "bin\Debug\helloworld.exe"
    Note: You'll definitely want to use the "Sgen" garbage collector (hence the parameter) as the default one in Mono is unbearably slow.
  6. You can do a quick “smoke test” to verify everything is in order with both your compilation and execution. Have your program execute something like:
    Console.Write(typeof (Console).Assembly.CodeBase);
    … and this should print out a path similar to:
    file:///C:/PROGRA~2/MONO-2~1.9/lib/mono/4.0/mscorlib.dll
    I’ve no idea why it prints it out using 8.3 filename format, but there you go! You’ll notice that if you run your program outside of the Mono runtime then it will pick up the Microsoft CLR version from the GAC.
Advertisements

Simulating the P of CAP on a Riak cluster

Posted in .NET Framework, Automation, Unix Environment by Nathan B. Evans on February 10, 2013

When developing and testing a distributed system, one of the essential concerns you will deal with is eventual consistency (EC). There are plenty of articles covering EC so I’m not going to dwell on that much here. What I am going to talk about is testing, particularly of the automated kind.

Testing an eventually consistent system is difficult because everything is transient and unpredictable. Unless you tune your consistency values like N and DW then you’re offered no guarantees about the extent of propagation of your commit around the cluster. And whilst consistency tuning may be acceptable for some tests, it most definitely won’t be acceptable for tests that cover concurrent write concerns such as your sibling resolution protocols.

What is “partition tolerance”?

This is where a single node or a group of nodes in the cluster become segregated from the rest of the cluster. I liken it to a “net split” on an IRC network. The system continues operating but it has been split into two or more parts. When a node has become segregated from the rest of the cluster it does not necessarily mean that its clients can also not reach it. Therefore all nodes can continue to perform writes on the dataset.

Generally speaking, a partition event is a transient situation. They may last a few seconds, a few minutes, hours.., days.. or even weeks. But the expectation is that eventually the partition will be repaired and the cluster returned to full health.

State changes of a cluster during a partition event

Fig. 1: State changes of a cluster during a partition event

In the diagram (Fig. 1) there is a cluster comprised of three nodes, and this is the sequence of events:

  1. N2 becomes network partitioned from N1 and N3.
    N1 and N3 can continue replicating between one another.
  2. Client(s) connected to either N1 or N3 perform two writes (indicated by the green and orange).
  3. Client(s) connected to N2 perform three writes (purple, red, blue).
  4. When the network partition is resolved, the cluster begins to heal by merging the commits between the nodes.
  5. The yellow and green commits (that were already replicated between N1 and N3) are propagated onto N2.
  6. The purple, red and blue commits on N2 are propagated onto N1 and N3.
  7. The cluster is now fully healed.

Simulating a partition

I have a Riak cluster of three nodes running on Windows Azure and I needed some way to deterministically simulate a network partition scenario. The solution that I came up with was quite simple. I basically wrote some iptables scripts that temporarily firewalled certain IP traffic on the LAN in order to prevent the selected node from communicating with any other nodes.

To erect the partition:

# Simulates a network partition scenario by blocking all TCP LAN traffic for a node.
# This will prevent the node from talking to other nodes in the cluster.
# The rules are not persisted, so a restart of the iptables service (or indeed the whole box) will reset things to normal.

# First add special exemption to allow loopback traffic on the LAN interface.
# Without this, riak-admin gets confused and thinks the local node is down when it isn't.
sudo iptables -I OUTPUT -p tcp -d $(hostname -i) -j ACCEPT
sudo iptables -I INPUT -p tcp -s $(hostname -i) -j ACCEPT

# Now block all other LAN traffic.
sudo iptables -I OUTPUT 2 -p tcp -d 10.0.0.0/8 -j REJECT
sudo iptables -I INPUT 2 -p tcp -s 10.0.0.0/8 -j REJECT

To tear down the partition:

# Restarts the iptables service, thereby resetting any temporary rules that were applied to it.
sudo service iptables restart

Disclaimer: These are just rough shell scripts designed for use on a test bed development environment.

I then came up with a fairly neat little class that wraps up the concern of a network partition:

internal class NetworkPartition : IDisposable {
    private readonly string _nodeName;
    private bool _closed;

    private NetworkPartition(string nodeName) {
        _nodeName = nodeName;
    }

    /// <summary>
    /// Creates a temporary network partition by segregating (at the IP firewall level) a particular node from all other nodes in the cluster.
    /// </summary>
    /// <param name="nodeName">The name of the node to be network partitioned. The name must exist as a PuTTY "Saved Session".</param>
    /// <returns>An object that can be disposed when the partition is to be removed.</returns>
    public static IDisposable Create(string nodeName) {
        var np = new NetworkPartition(nodeName);
        Plink.Execute("simulate-network-partition.sh", nodeName);
        return np;
    }

    private void Close() {
        if (_closed)
            return;

        Plink.Execute("restart-iptables.sh", _nodeName);

        _closed = true;
    }

    public void Dispose() {
        Close();
    }
}

Which allows me to write test cases in the following way:

RingStatus.AssertOkay("riak03");

using (NetworkPartition.Create("riak03")) {
    RingStatus.AssertDegraded("riak03");

    // TODO: Do other stuff here whilst riak03 is network partitioned from the rest of the cluster.
}

RingReady.Wait("riak03");
RingStatus.AssertOkay("riak03");

Yes, RingStatus and RingReady aren’t documented here. But they’re pretty simple.

Obviously as part of this work I had to write a quick and dirty wrapper around the PuTTY plink.exe tool. This tool is basically a CLI version of PuTTY and it is very good for scripting automated tasks.

My solution could be improved to support creating partitions that consist of more than just one node, but for me the added value of this would be very small at this stage. Maybe I will add support for this later when I get into stress testing territory!

Source

You can view it over on GitHub; the namespace is tentatively called “ClusterTools”. Bear in mind it’s currently held inside another project but if there’s enough interest I will make it standalone. There has been talk on the CorrugatedIron project (which is a Riak client library for .NET) about starting up a Contrib library, so maybe this could be one its first contributions.

Riak timing out during startup

Posted in Databases, Unix Environment by Nathan B. Evans on February 6, 2013

Recently I’ve been working on some interesting projects involving the eventually consistent Riak key-value database. Today I encountered a puzzling issue with a fresh cluster I was deploying. I had seemingly done everything identically to in the past except something was causing it to fail to startup when in service mode.

[administrator@riak01 ~]$ sudo service riak start
Starting Riak: Riak failed to start within 15 seconds,
see the output of 'riak console' for more information.
If you want to wait longer, set the environment variable
WAIT_FOR_ERLANG to the number of seconds to wait.
[FAILED]

Bizarrely, in console mode it would start up fine. This led to me to believe it was some sort of user or permissions issue but I wasn’t totally sure. Perhaps I had accidentally executed Riak as another user and some sort of locking file was created? Or was it perhaps a performance issue with the “Shared Core” Azure VM’s I was using.

First I followed Riak’s advice by increasing the WAIT_FOR_ERLANG environment variable, I tried first 30 and then 60 seconds. But this made no difference at all. I’m not even sure if Riak was even using my new value as it still kept on printing out “15 seconds” as its reason for failing to start.

I researched some more and many places on the interweb were suggesting to purge the /var/lib/riak/ring/ directory (don’t do this if you have valuable data stored on the Riak instance). I tried this, but it also had no effect.

But it turned out that the solution was incredibly simple. Riak had created some sort of temporary lock directory at /tmp/riak. All I had to do was delete this directory and, hey presto, Riak would now start perfectly fine as a service!

$ sudo rm -r /tmp/riak

There may be more posts on the subject of Riak soon. 🙂

PS: I am using Riak version 1.2.1, on CentOS 6.3.

Tagged with: ,