Unless you’ve been living under a firewalled rock, you know that IPv6 is coming. There’s also a good chance that you’ve heard that IPv6 doesn’t have NAT. Or, if you pay close attention to the minutiae of IPv6 development, you’ve heard that IPv6 does have NAT, but you don’t have to (and shouldn’t) use it.
So let’s say we’ll skip NAT for IPv6. Fair enough. However, let’s say you have this use case:
A bunch of containers that need Internet access…
That are running in a VM…
On your laptop…
Behind your home router!
For IPv4, you’d just layer on the NAT, right? While SIP and IPsec might have kittens trying to work through three layers of NAT, for most things it’ll Just Work.
In the Grand Future of IPv6, without NAT, how the hell do you make that happen? The answer is “Prefix Delegation”, which allows routers to “delegate” management of a chunk of address space to downstream routers, and allow those downstream routers to, in turn, delegate pieces of that chunk to downstream routers.
In the case of our not-so-hypothetical containers-in-VM-on-laptop-at-home scenario, it would look like this:
My “border router” (a DNS-323 running Debian) asks my ISP for a delegated prefix, using DHCPv6. The ISP delegates a
/64out of that is allocated to the network directly attached to the internal interface, and the rest goes into “the pool”, as
/60blocks (so I’ve got 15 of them to delegate, if required).
My laptop gets an address on the LAN between itself and the DNS-323 via stateless auto-addressing (“SLAAC”). It also uses DHCPv6 to request one of the
/60blocks from the DNS-323. The laptop puts one
/64from that block as the address space for the “virtual LAN” (actually a Linux bridge) that connects the laptop to all my VMs, and puts the other 15
/64blocks into a pool for delegation.
The VM that will be running the set of containers under test gets an address on the “all VMs virtual LAN” via SLAAC, and then requests a delegated
/64to use for the “all containers virtual LAN” (another bridge, this one running on the VM itself) that the containers will each connect to themselves.
Now, almost all of this Just Works. The current releases of ISC DHCP support prefix delegation just fine, and a bit of shell script plumbing between the client and server seals the deal – the client needs to rewrite the server’s config file to tell it the netblock from which it can delegate.
Except for one teensy, tiny problem – routing. When the DHCP server delegates a netblock to a particular machine, the routing table needs to get updated so that packets going to that netblock actually get sent to the machine the netblock was delegated to. Without that, traffic destined for the containers (or the VM) won’t actually make it to its destination, and a one-way Internet connection isn’t a whole lot of use.
I cannot understand why this problem hasn’t been tripped over before. It’s absolutely fundamental to the correct operation of the delegation system. Some people advocate running a dynamic routing protocol, but that’s a sledgehammer to crack a nut if ever I saw one.
Actually, I know this problem has been tripped over before, by OpenWrt. Their solution, however, was to use a PHP script to scan logfiles and add routes. Suffice it to say, that wasn’t an option I was keen on exploring.
Instead, I decided to patch ISC DHCP so that the server can run an external
script to add the necessary routes, and perhaps modify firewall rules – and
also to reverse the process when the delegation is released (or expired).
If anyone else wants to play around with it, I’ve put it up on
I don’t make any promises that it’s the right way to do it, necessarily,
but it works, and the script I’ve added in
shows how it can be used to good effect. By the way, if anyone knows how
pull requests work over at ISC, drop me a line. From the look of their
website, they don’t appear to accept (or at least encourage) external
So, that’s one small patch for DHCP, one giant leap for my home network.
The standard recommendation is for ISPs to delegate each end-user customer a
/64networks); my ISP is being a little conservative in “only” giving me 256
/64s. It works fine for my purposes, but if you’re an ISP getting set for deploying IPv6, make life easy on your customers and give them a