Last night we performed a big switch over in our data centre. We moved everything onto a new managed switch and Sonicwall firewall, re-pointed and re-addressed lots and lots of servers, and well basically just done a bunch of stuff we should have done yonks ago! Everything seemed to go really well except for one thing: our Hyper-V hosts were now throwing really annoying and random disconnection errors when connected straight into a VM using its “Connect…” menu item, or otherwise known as
VMConnect.exe. The connection would work for at least a couple seconds, sometimes for as long as a minute or two. But then it would barf up and the following error message dialog would be displayed.
The full description of the error was as follows:
The connection to the virtual machine was lost. This can occur when a virtual machine stops unexpectedly, or when network problems occur. Try to connect again. If the problem persists, contact your system administator.
Would you like to try to reconnnect?
This was really annoying because we were connecting to local VMs that were present on the exact same VM host from which we were connecting. So presumably there wouldn’t be any packets hitting the network, and thus ruling out any of the new hardware and network changes we had just made.
After racking my brains on it for a bit (which included firing up Wireshark to perform a sanity check), I loaded up TCPView. This is a really great little tool from Mark Russinovich‘s stable called Windows Sysinternals. With this tool running I then retried the VMConnect, so that I could see what socket activities it was performing.
What this showed is that even when connecting to the local VM host using “
localhost” or “
127.0.0.1” as the address (i.e. IPv4) the VMConnect tool was seemingly transforming this into a IPV6 address and then forming a TCPV6 connection. This was interesting.
I immediately went to check whether IPV6 was actually enabled on the VM host’s network adapters. Low and behold, it was not. Turns out that when we flicked over the Gateway IP to point to the new firewall, we also subconciously turned off the IPV6 protocol on the list! A fairly innocuous thing to do, one would think, especially on an internal LAN!
So there you have it. If you come across this problem with Hyper-V, I would recommend you immediately check to ensure that you have not inadvertently disabled the IPV6 protocol on your virtual network adapter for Hyper-V.
The very moment we re-enabled IPV6, the problem with VMConnect constantly disconnecting every few seconds totally went away!
Not many problems get more obscure than this.
I’ve been setting up lots and lots of small details on our HA-Proxy cluster this week. This post is just a small digest of some of the things I have learnt.
option nolinger is considered harmful.
I read somewhere that this option should be enabled because it frees up socket resources quicker and doesn’t leave them lying around when blatently dead. I enabled it and thought nothing more of it. Having forgot I had done so, I then started noticing strange behaviours. Most tellingly was that HA-Proxy’s webstats UI would truncate abruptly before completing. Fortunately, Willy Tarreau (the author/maintainer) was very quick to respond to my pestering e-mails and after seeing my Wireshark trace he immediately had a few ideas of what could be causing it. After following his suggestion to avoid using the “no linger” option, I removed it from my configuration and the problem went away.
Therefore: “option nolinger considered harmful.” You’ve be warned!
Webstats UI has “hidden” administrative functions
While reading the infamous “wall of text” that is the HA-Proxy documentation, I came across a neat option called “
stats admin“. It enables a single piece of extra functionality (at least it does in v1.4.11) that will let you flag servers as being online or offline. This is useful if you’re planning to take one or more servers out of a backend’s pool, for maintenance possibly. I would wager that Willy intends to add more administrative features in the future so adding this one to your config now could save you some time in the future.
Of course, it is not likely that you will want such a sensitive function to be exposed to everyone that uses webstats. So it is fortunate then that this option supports a condition expression. I set mine up like the following:
userlist UsersFor_HAProxyStatistics group admin users admin user admin insecure-password godwouldntbeupthislate user stats insecure-password letmein listen HAProxy-Statistics *:81 mode http stats enable stats uri /haproxy?stats stats refresh 60s stats show-node stats show-legends acl AuthOkay_ReadOnly http_auth(UsersFor_HAProxyStatistics) acl AuthOkay_Admin http_auth_group(UsersFor_HAProxyStatistics) admin stats http-request auth realm HAProxy-Statistics unless AuthOkay_ReadOnly stats admin if AuthOkay_Admin
Request/response rewriting is mutually exclusive of keep-alive connections
At least in current versions, HA-Proxy doesn’t seem to be able to perform rewriting on connections that have been kept alive. It is limited to analysing only the first request and response. Any further requests that occur on that connection will go unanalysed. So if you are doing request or response rewriting, it is imperative that you set a special option to ensure that a connection can only be used once.
In my case, I just added the following to my
Identifying your frontend from your backend
I was creating some rules to ensure that a particular URL could only be accessed through my HTTPS frontend. I wanted to prevent unencrypted HTTP access to this URL because it was using HTTP Basic authentication which uses clear text passwords across the wire.
Fortunately, HA-Proxy supports a fairly neat way of doing this by the means of tagging your frontend with a unique identifier which can then be matched against by the backend.
First of all, I setup my frontends like the following:
frontend Public-HTTP id 80 mode http bind *:80 option http-server-close default_backend Web-Farm frontend Public-HTTPS id 8443 mode http # Note: Port 8443 because the true 443 is being terminated by Stunnel, which then forwards to this 8433. bind *:8443 option http-server-close default_backend Web-Farm
Then in my backend I cleared a space for defining “reusable” ACLs and then added the protective rule for the URL in question:
backend Web-Farm mode http balance roundrobin option httpchk server Web0 172.16.61.181:80 check server Web1 172.16.61.182:80 check # Common/useful ACLs acl ViaFrontend_PublicHttp fe_id 80 acl ViaFrontend_PublicHttps fe_id 8443 # Application security for: /MyWebPage/ acl PathIs_MyWebPage path_beg -i /mywebpage http-request deny if PathIs_MyWebPage !ViaFrontend_PublicHttps
The piece of magic that makes this all work is the
fe_id ACL criterion. Note that the “fe” stands for “frontend”.
Note the http-request deny rule is comprised of two ACLs, by boolean AND’ing them. HA-Proxy defaults to AND’ing. If you want to OR just type “
or” or “
||“. Negation is done in the normal C way by using an exclamation symbol, as shown in the above example. I seem to like avoiding the use of the “
unless” statement as I prefer the explicitness of using “
if” and then using negation. But that’s just my personal preference as a long-time coder
Now if a user tries to visit
http://.../MyWebPage they will get a big fat ugly
403 Forbidden error.
HTTP Basic authentication is finally very basic to do!
I came across a stumbling block this week. I assumed that Microsoft IIS, one of the best web servers available, could do HTTP Basic authentication i.e. clear text passwords over the wire and then validating against some sort of clear text password file or database. Turns out that while IIS does support HTTP Basic auth’, it doesn’t support any form of simple backend. You have to validate against either the web servers local Windows user accounts, or against Active Directory. Great. The web page in question was just a little hacky thing we knocked up to get a customer of ours out of a hole. We didn’t want to be creating maintenance headaches for ourselves by creating a local user account on each web server in the farm, nor did we fancy creating them an AD account. They don’t even belong to our company!
Fortunately (that word again), and despite how poorly documented it is, HA-Proxy *does* support this!
First of all you need to create a userlist that will contain your users/groups that you will authenticate against:
userlist UsersFor_AcmeCorp user joebloggs insecure-password letmein
Then in your backend, you need to create an ACL that uses the
http_auth criterion. And lastly, create an
http-request auth rule that will cause the appropriate
401 Unauthorized and
WWW-Authenticate: Basic response to be generated if the authentication has failed.
backend HttpServers .. normal backend stuff goes here as usual .. acl AuthOkay_AcmeCorp http_auth(UsersFor_AcmeCorp) http-request auth realm AcmeCorp if !AuthOkay_AcmeCorp
Remove sensitive IIS / ASP.NET response headers
Security unconscious folk need not apply.
It’s a slight security risk to be leaking your precise IIS and ASP.NET version numbers. Whilst these can be turned off in IIS configuration, it is more a concern for your frontend load balancer i.e. HA-Proxy. The reason I believe this is because the headers can be useful debugging on the internal LAN/VPN inside your company. Only when the headers are about to touch the WAN does it become dangerous. Therefore:
frontend Public-HTTP # Remove headers that expose security-sensitive information. rspidel ^Server:.*$ rspidel ^X-Powered-By:.*$ rspidel ^X-AspNet-Version:.*$
HTTPS and separation of concerns
I don’t know about Apache, but IIS 7.5 can have some annoying (but arguably expected) behaviours when HA-Proxy is passing traffic where the client believes it has an end-to-end HTTPS connection with the web server. My setup involves Stunnel terminating the SSL connection and then from that point on it is just standard HTTP traffic to the backend servers. This means the backend servers don’t actually need to be listening on HTTPS/443 at all. However when GET requests come in to them using the https:/ scheme they can get a bit confused (or argumentative, I’m undecided). IIS seems to like sending back a
302 Moved Permanently response, with a
Location header that uses the http:/ scheme. So then of course the web browser will follow the redirect to either a URL that doesn’t exist or one which does exist but is already merely a redirect to the https:/ scheme! Infinitely loop anyone?
The way to solve this is request rewriting, through some clever use of regular expressions.
frontend Public-HTTPS id 8443 mode http bind *:8443 option http-server-close default_backend Web-Farm # Rewrite requests so that they are passed to the backend as http:/ schemed requests. # This may be required if the backend web servers don't like handling https schemed requests over non-https transport. # I didn't use this in the end - but it might come in handy in the future so I left it commented out. # reqirep ^(\w+\ )https:/(/.*)$ \1http:/\2 # Rewrite responses containing a Location header with HTTP scheme using the relative path. # We could alternatively just rewrite the http:/ to be https:/ but then it could break off-site redirects. rspirep ^Location:\s*http://.*?\.acmecorp.co.tld(/.*)$ Location:\ \1 rspirep ^Location:(.*\?\w+=)http(%3a%2f%2f.*?\.acmecorp.co.tld%2f.*)$ Location:\ \1https\2
rspirep in the above example is the most important. The second is something more specific to a particular web application we’re hosting that uses a
?Redirect=http://yada.yada style query string in certain places.
rspirep rule (the
i means case-insensitive matching) is very powerful. The only downside is that you do need to be fairly fluent with regular expressions. It requires only two parameters, the first is your regular expression and the second is your string replacement.
The string replacement that occurs in the second parameter supports expansion based upon indexed capture groups from the regular expression that was matched. This is useful for merging very specific pieces from the match back into the replacement string, as I am doing in the example above. They take the form of
\2 etc. Where the number indicates the capture group index number. And capture groups are denoted in the regular expression by using parenthesis, if you didn’t know.
Truly “live” updates on the Webstats UI
One of the first things I noticed in the hours after deploying HA-Proxy is that the webstat counters that are held for each frontend, listen and backend are not actually updated as frequently as they perhaps ought to be. Indeed, the counters for any given connection are not accumulated until that connection has ended. This is bad if your application(s) tend to hold open long-duration connections. It reduces your usability of HA-Proxy’s reporting. I’m sure there are very good performance reasons that Willy did this, as that is what is alluded to in the documentation. Fortunately there is a very simple workaround for this in the form of the contstats option.
Simply add the following to your proxy and benefit from higher accuracy webstats:
Until next time…
This is sort of a follow-up to the Deploying HA-Proxy + Keepalived with Mercurial for distributed config post.
During testing we were coming across an issue where the HA-Proxy instance running on the slave member of our cluster would fail to bind some of its frontend proxies:
Starting haproxy: [ALERT] : Starting proxy Public-HTTPS: cannot bind socket
After some head scratching I noticed that the problem was only arising on those proxies that explicitly defined the IP address of a virtual interface that was being managed by Keepalived (or maybe Heartbeat for you).
This is because both of these High-Availability clustering systems use a rather simplistic design whereby the “shared” virtual IP is only installed on the active node in the cluster. While the nodes that are in a dormant state (i.e. the slaves) do not actually have those virtual IPs assigned to them during that state. It’s a sort of “IP address hot-swapping” design. I learnt this by executing a simple a command, first from the master server:
$ ip a <snipped stuff for brevity> 2: seth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:15:5d:28:7d:19 brd ff:ff:ff:ff:ff:ff inet 172.16.61.151/24 brd 172.16.61.255 scope global seth0 inet 172.16.61.150/24 brd 172.16.61.255 scope global secondary seth0:0 inet 172.16.61.159/24 brd 172.16.61.255 scope global secondary seth0:1 inet6 fe80::215:5dff:fe28:7d19/64 scope link valid_lft forever preferred_lft forever <snipped trailing stuff for brevity>
Then again, from the slave server:
$ ip a <snipped stuff for brevity> 2: seth0: mtu 1500 qdisc pfifo_fast qlen 1000 link/ether 00:15:5d:2d:9c:11 brd ff:ff:ff:ff:ff:ff inet 172.16.61.152/24 brd 172.16.61.255 scope global seth0 inet6 fe80::215:5dff:fe2d:9c11/64 scope link valid_lft forever preferred_lft forever <snipped trailing stuff for brevity>
Unfortunately this behaviour can cause problems for programs like HA-Proxy which have been configured to expect the existence of specific network interfaces on the server. I was considering working around it by writing some scripts that hook events within the HA cluster to handle stopping and starting the HA-Proxy when needed. But this approach seemed clunky and unintuitive. So I dug a little deeper and came across a bit of a gem hidden away in the depths of the Linux networking stack. It is a simple boolean setting called “
net.ipv4.ip_nonlocal_bind” and it allows a program like HA-Proxy to create listening sockets on network interfaces that do not actually exist on the server. It was created specially for this situation.
So in the end the fix was as simple as adding/updating the
/etc/sysctl.conf file to include the following key/value pair:
My previous experience of setting up these low-level High-Availability clusters was with Windows Server’s feature called Network Load Balancing (NLB). This works quite different from Keepalived and Heartbeat. It relies upon some low level ARP hacking/trickery and some sort of distributed time splicing algorithm. But it does ensure that each node in the cluster (whether in a master or slave position) will remain allocated with the virtual IP address(es) at all times. I suppose there is always more than one way to crack an egg…