Severe Discomfort's "The Joy of UDP"

Posted: Wed, 15 November 2006 | permalink | No comments

(For the grammar Nazis out there: no, that apostrophe is not misplaced)


There are times when a connectionless protocol almost doesn't seem worth the hassle.

Take, for instance, the troubles of a round-trip communication, using UDP, where the initial request is sent to a secondary IP address on an interface. Your code, on the server, might look like this:

 s = UDPSocket.new
 s.bind('', 12345)
 loop { data, remote = s.recvfrom(65535); s.send(data, 0, remote[3], remote[1] }

This is a pretty simple Ruby fragment which just reflects your UDP packets back at you. It really doesn't get much simpler than that. Connect to this thing from somewhere else (using nc -u remoteip 12345) and you can reflect packets back and forth to your heart's content. Or so you'd think...

When talking to a regular IP address, your packets probably look something like this (when viewed using tethereal):

 192.168.0.1 -> 192.168.0.2 UDP Source port: 35716 Destination port: 12345
 192.168.0.2 -> 192.168.0.1 UDP Source port: 12345 Destination port: 35716

Fantastic -- I send a packet, and get one back. In netcat, you see whatever you typed printed back out again.

But, what if I add a secondary IP address to the server's interface (ifconfig eth0:1 192.168.0.5, for instance) and try sending packets to that IP address (with nc -u 192.168.0.5 12345)? I get this insanity:

 192.168.0.1 -> 192.168.0.5 UDP Source port: 35772 Destination port: 12345
 192.168.0.2 -> 192.168.0.1 UDP Source port: 12345 Destination port: 35772

And of course, no self-respecting network stack is going to accept a packet back from a totally different IP address, even if it did manage to get the ports right. Hence, the client never sees the return packets, and thinks that the server is snubbing it.

What's happening here seems insane, but it's actually just a regular artifact of a connecntionless protocol like UDP. You see, we've bound our socket to the "any" address (the empty string in the call to #bind tells UDPSocket to bind to the "any" address -- no, I don't know why either, it's not a particularly intuitive interface). When we get the packet into our app via #recvfrom, we get no information on what address the packet was actually sent to, only that the OS decided that the packet was for us (yay for encapsulation). The only information we get is where the packet was coming from -- and due to the fact that the kernel keeps no connection-related information around for the socket (the kernel figures it'll hold us to our connectionless pledge), when the time comes for us to send a packet back, we have to explicitly say where we want that packet to go -- that's the remote[3], remote[1] arguments in the #send call.

The brain-leakage starts when we work out where to say the packet is coming from. When we send the packet, a source address needs to calculated -- again, the kernel keeps no information on source or destination because we're connectionless. Because our socket is bound to the "any" interface, the OS thinks it has carte blanche to set whatever source address it likes. In Linux, the behaviour appears to be to pick the IP address set on the primary alias for the interface that will be used to send the packet out on. So for the common case (one IP address per physical interface) everything is fine. But those corner cases, gee, they're awfully sharp and pointy...

The solution, of course, is to send the packet out through a socket that is bound to the particular address you want to use as the source address -- then the kernel knows what source address you want, and there's no confusion. Seems pretty trivial, right? Surely every sensible network application does that already, right? So obviously Ruby's UDPSocket implementation just sucks. That's what I thought, and I don't blame you if you think so too. But before you raise pitchforks and storm the gem-covered castle, try repeating our little experiment with some other widely-used UDP-based applications, such as net-snmpd or your ntp server. (The NTP server is a little different, in that to trigger the problem you need to start it and then setup the interface alias, for reasons I'll explain later). Both net-snmpd and ntpd exhibit the same behaviour as our trivial UDP echo service -- at least, the versions present in Ubuntu Breezy do, although I can't imagine everyone's suddenly cottoned onto this problem in the last 12 months or so and come out with a rash of fixes.

So, basically, everyone seems to be suffering from the same malady. For values of "everyone" that equal "some UDP servers". There's some services that don't seem to have such troubles with alias interfaces -- the Bind DNS server comes immediately to mind, and ntpd works fine if you start the NTP server after you setup the interfaces. Why the difference? These services are clever and bind to each IP address on the system separately, like so:

 $ netstat -lun |grep :53
 udp        0      0 192.168.0.1:53          0.0.0.0:*                          
 udp        0      0 192.168.0.5:53          0.0.0.0:*                          
 udp        0      0 127.0.0.1:53            0.0.0.0:*                          

This is very cool, and solves the problem, except that if you add another interface after you start Bind, it doesn't pick up the new address, and you're boned until you restart the daemon. It also increases code complexity a bit, since you need add a select into your request handling loop (instead of just blocking on the #recvfrom call, like I've been doing thus far). Whilst I'm never happy about unnecessary complexity, I'll deal with it where I have to, but the real killer for this application is the "you can't change your interfaces after starting the daemon" rule -- my particular need is to make requests to a virtual service IP address, and restarting my service after each IP address change is going to suck hard.

I mentioned ntpd earlier because it has some extra trickery at work. It's list of bound interfaces looks like this:

udp        0      0 10.6.66.6:123           0.0.0.0:*                          
udp        0      0 192.168.250.6:123       0.0.0.0:*                          
udp        0      0 172.14.16.6:123         0.0.0.0:*                          
udp        0      0 172.31.62.1:123         0.0.0.0:*                          
udp        0      0 192.168.37.179:123      0.0.0.0:*                          
udp        0      0 10.64.64.64:123         0.0.0.0:*                          
udp        0      0 127.0.0.1:123           0.0.0.0:*                          
udp        0      0 0.0.0.0:123             0.0.0.0:*                          

It listens to every IP address explicitly plus the "any" interface. I'm not quite sure why, either -- just listening to the specific addresses works for Bind, and if you add an interface alias after starting ntpd, it exhibits exactly the same problem as the trivial UDP echo service, so I don't think that listening on "any" is really gaining it anything. Listening on "any" does make ntpd behave differently to Bind: with Bind, if you send packets to an interface that wasn't available when Bind was started, it just completely ignores you (because it lacks a socket listening for packets to that address), while with ntpd the packets hit the "any" socket and suffer from the wrong-source-address bug.

At the moment, my best chance of a solution appears to be to follow Bind and ntpd's lead and hook onto every address, but with the extra twist inspired by ntpd -- listen on the "any" address as well as all of the specific addresses. If I use any packets that arrive on the "any" socket as a trigger to rescan the available addresses (to add any new addresses that have been created since the last scan), that should solve all my problems.

Now I only need to make it work (and deal with the insanities that will no doubt result). Wish me luck. The screams of pain and frustration you hear (no matter where you are in the world) are probably mine.


Post a comment

All comments are held for moderation; markdown formatting accepted.

This is a honeypot form. Do not use this form unless you want to get your IP address blacklisted. Use the second form below for comments.
Name: (required)
E-mail: (required, not published)
Website: (optional)
Name: (required)
E-mail: (required, not published)
Website: (optional)