diff --git a/content/posts/new-homenet-architecture/featured.jpg b/content/posts/new-homenet-architecture/featured.jpg new file mode 100755 index 0000000..aeaf8b5 Binary files /dev/null and b/content/posts/new-homenet-architecture/featured.jpg differ diff --git a/content/posts/new-homenet-architecture/first_net.svg b/content/posts/new-homenet-architecture/first_net.svg new file mode 100644 index 0000000..8007e0b --- /dev/null +++ b/content/posts/new-homenet-architecture/first_net.svg @@ -0,0 +1,4 @@ + + + +
VMHost 1
Internet
Workload VM
Core Switch 1
Core Switch 2
Access Switch 1
Access Switch 2
Core Router 1
Edge Router 1
Access Router 1
VMHost 2
Workload VM
Core Router 2
Edge Router 2
Access Router 2
Virtual Switch 1
Virtual Switch 2
\ No newline at end of file diff --git a/content/posts/new-homenet-architecture/first_new_net.svg b/content/posts/new-homenet-architecture/first_new_net.svg new file mode 100644 index 0000000..e98b187 --- /dev/null +++ b/content/posts/new-homenet-architecture/first_new_net.svg @@ -0,0 +1,4 @@ + + + +
WiFi Router
Fabric Switch A 
Fabric Switch B 
Access Router
Internet
Edge Router
VMHost 1
Workload VM
Virtual Switch
Workload VM
VMHost 2
Workload VM
Virtual Switch
Workload VM
NAT64 Gateway
\ No newline at end of file diff --git a/content/posts/new-homenet-architecture/index.md b/content/posts/new-homenet-architecture/index.md new file mode 100644 index 0000000..cfad6b2 --- /dev/null +++ b/content/posts/new-homenet-architecture/index.md @@ -0,0 +1,238 @@ +--- +title: "New Home Network Architecture" +date: "2025-11-19T11:30:00-07:00" +description: "The architecture of my current home network" +summary: "I decided to rearchitect my home network, and this is the result." +--- + +Almost two years ago, I decided to undertake another major overhaul of my home +network. It is the latest step in the long evolution of my personal systems, +and is now in a state that I'm fairly happy with. + +## History + +I've been networking my computers together for decades. The first example that +could reasonably be called a network was around 1996, where I connected my +Amiga 1200 and Amiga 3000T together over a null-modem serial cable and ran IP +between them. I eventually got some Ethernet hardware and ran mostly simple +flat networks for a while. + +### Orthodoxy + +After working professionally designing and building networks and IT systems, I +had learned a few rules. Networks in particular always consisted of several +key elements: + +- **Three Tiers**. You needed Core, Distribution, and Access switches. This + helps to scale the network and keep things well balanced. +- **VLANs**. Every packet needs to go through a VLAN. VLANs keep the network + segregated for security, and allow for smaller broadcast domains. +- **Firewalls**. The network has to protect the vulnerable endpoints by + blocking packets that aren't permitted by policy. +- **Virtualization**. Virtualize everything to decouple the systems from the + underlying infrastructure, keeping them portable. + +Naturally, I took these ideas home and built my personal networks accordingly. + +### Something's not right + +Eventually, I had built myself a network that looked something like the +diagram below. I kept to the principles I was familiar with, and this was the +result. + +![First Network Diagram](first_net.svg 'My last "orthodox" network architecture') + +I had VLANs for everything coming from the VM hosts to the physical switches. +Traffic would go in and out and in again so everything would route through the +tiers like they're supposed to. I had more VMs acting as routers than I had +VMs doing productive activity. + +When I decided that I would like to start using IPv6 in my network, everything +doubled. I kept the IPv4 and IPv6 traffic on separate VLANs, and had separate +routers for everything, doubling what's in that diagram. It didn't take me +long to notice that this wasn't working out, and started to think about a new +approach. + +## New Design + +When I started to think about what I really needed in my home network, I +came up with several principles: + +- **IPv6 First**. Using IPv6 makes so many things simpler. No NAT. Subnets + don't run out of addresses. Link-local addressing that works, and is useful. + No DHCP. *No NAT!* +- **Zero Trust**. I'm a fan of zero-trust architectures. When you place the + responsibility for security on the network, it tends to get complicated. You + make physical design choices around isolating packets on the right segments + and getting them to the right firewalls, where the network should focus on + moving traffic through it as quickly as possible. This goal of simply + getting packets to where they need to be is how the Internet scales so well. + We can keep physical LANs simple and efficient like this, and leave the + security concerns to the endpoints. The endpoints need to be secure anyways, + and we now have more modern and effective tools to help us. +- **Network Fabric**. Rather than redoing the same 3-tier model I was used to, + I wanted to do something more efficient. I was inspired by an article + written at Facebook about their + ["data center fabric"](https://engineering.fb.com/2014/11/14/production-engineering/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/) + architecture. This is obviously much larger than what anyone needs in a home + network, but I thought that these were good ideas that I could use. +- **Routing Protocols**. I've traditionally used [OSPF](https://en.wikipedia.org/wiki/Open_Shortest_Path_First) + in the networks I've operated. When I decided to implement IPv6 in the + network, I was using [Quagga](https://www.nongnu.org/quagga/) for the + routing software. It doesn't support OSPF areas in OSPFv3 for IPv6, which + made me reconsider. I settled on [IS-IS](https://en.wikipedia.org/wiki/IS-IS), + as it supports both IPv4 and IPv6 at the same time, and could do everything + that I needed it to do. + +### Refresh, Version 1 + +My first refreshed network design looked like this: + +![New Network Diagram, Version 1](first_new_net.svg 'My first updated network architecture') + +I did away with all of the traditional complexity, and established two network +"fabrics" that all of the traffic would pass through. The fabrics do not +connect directly at layer 2, each side is separate. Each fabric is a single +flat network, there are no VLANs. + +These were the key design decisions: + +- **Routing over switching**. Every physical device connecting into the fabric + switches would conform to a basic set of requirements: + 1. It will act as a router, running IS-IS to communicate with every other + device on the fabric switches. + 2. Each endpoint uses a single loopback address as its network identity. It + advertises this address to the other nodes, as well as the subnets that + it can route to. + 3. Routes are advertised over both fabrics, enabling [ECMP](https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing) + for higher availability and bandwidth. +- **IPv6 first**. The access routers and Wifi routers only had IPv6 subnets + available for client devices. This allowed me to do away with DHCP services + on the network, only using [SLAAC](https://en.wikipedia.org/wiki/IPv6_address#Stateless_address_autoconfiguration_(SLAAC)). + Access to IPv4-only resources was through the use of + [DNS64 and the NAT64 gateway](https://en.wikipedia.org/wiki/IPv6_transition_mechanism#NAT64). + +## Next Generation + +At this point, I was fairly happy with the result. The network was efficient +and much easier to maintain. It was faster, thanks to ECMP and having fewer +hops. As I was using it however, I started to think about the next set of +improvements. + +- **Single point routers**. I had only single devices acting as my edge, + access, and Wifi routers. I wanted some redundancy in case one failed, and + to make maintenance more transparent with the ability to fail over. +- **Virtual Machines**. Most of my workloads were set up as virtual machines. + I wanted to migrate to [Kubernetes](https://kubernetes.io/) as everything + I was running could be run there, plus many other benefits. +- **NAT64**. Here I was running IPv6 to get away from needing NAT, but I still + needed NAT. This setup was mostly working fine, but there were a few small + irritations: + - There are not very many NAT64 implementations. I was using [JooL](https://jool.mx/); + it's a non-standard Linux kernel module, and it's not really actively + developed anymore. + - The path from the NAT64 gateway out the Edge router is still IPv4, and I + still need to do NAT for IPv4 at the edge. + - Applications connecting directly to an IPv4 address weren't able to do so. + I could use [464XLAT](https://en.wikipedia.org/wiki/IPv6_transition_mechanism#464XLAT) + on endpoints that supported it, but it's yet another thing to set up. + - There's the occasional device that still doesn't support IPv6, or doesn't + support it properly. +- **BGP**. I was purely using IS-IS throughout the network, but Kubernetes + CNIs that work on bare metal systems like mine rely on BGP to advertise + routes into the network. I'd have to work out how to incorporate this. +- **Easier WiFi**. I was using a WiFi router running [OpenWRT](https://openwrt.org/), + connecting to both fabrics and running IS-IS just like everything else. + OpenWRT is great, but it is challenging to keep devices up-to-date. +- **Load Balancing**. I didn't have any solution for establishing network + load balancing for scale and availability. + +### Refresh, Version 2 + +Incorporating the improvements I wanted to make, here is the resulting network +architecture: + +![New Network Diagram, Version 2](second_new_net.svg 'My current network architecture') + +The key changes are: + +- **Redundant Routers**. I doubled up the edge and access routers. They can + effectively divide the load and fail over when needed. +- **Anycast for Load Balancing**. I've standardized on making use of + [Anycast](https://en.wikipedia.org/wiki/Anycast) addressing for creating + load balanced and redundant network services. I'm using this a few ways: + - The API server for my Kubernetes cluster is on an anycast address. This + address is advertised from the three control plane nodes. + - Kubernetes `LoadBalancer` type services allocate an address from a pool + and advertise it out from any node that can accept traffic for the + service. + - My recursive DNS servers providing DNS lookups for the network are on two + anycast addresses. Each edge router runs an instance and advertises one of + the addresses; this is so I can "bootstrap" the network from the edge + routers. I also run the DNS service under Kubernetes, this advertises the + same anycast addresses using ordinary `LoadBalancer` services. +- **IS-IS and BGP**. I took a few passes at getting this right. I first tried + to move fully from IS-IS to BGP only. This meant setting up peering + using IPv6 link local addresses, which worked, but it was a bit flaky under + [FRR](https://frrouting.org/). I settled on using IS-IS on the fabric + interfaces only to exhange the IPv6 loopback addresses of each node. I use + the globally routable loopback addresses for the BGP peering, which is much + easier in practice. All of the other routes (access subnets, Kubernetes + networks, anycast addresses, defaults from the edge routers) are exchanged + using BGP. +- **No NAT64**. I decided to do away with NAT64 and provide dual-stack + connectivity to the access networks. I set up [Kea](https://www.isc.org/kea/) + as a cluster on the two access routers, which is thankfully rather low + maintenance. +- **BGP Extended-Nexthop**. An added bonus to using BGP the way that I am is + that I could make use of the [BGP extended-nexthop](https://datatracker.ietf.org/doc/html/rfc8950) + capability. The old network with only IS-IS still required me to define IPv4 + subnets on the switching fabrics, nodes would use IPv4 addresses as the next + hop gateway addresses for IPv4 routes. With the extended-nexthop capability + in BGP, it uses the IPv6 link-local addresses for the next hop under both + IPv4 and IPv6. + +### High Availability + +To migrate from single routers to redundant pairs, I needed to figure out a +few things. + +#### Default Routes + +With a single edge router, this was easy. With two, it's a bit of a puzzle. My +ISP doesn't actually provide fully routed IPv6 connectivity with my class of +service. I do get static IPv4 addresses, however. I've been using Hurricane +Electric's [tunnel broker](https://tunnelbroker.net/) service to get a routed +`/48` IPv6 subnet. + +With a pair of edge routers, I've set them up with four static IPv4 addresses +on their Internet-facing interfaces. Each router gets one address. I then have +two [VRRP](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol) +interfaces, one that I use to terminate the IPv6 tunnel, and the other I use +for all IPv4 traffic. When both routers are up and running, one will have the +IPv6 tunnel and the other will have the IPv4 interface. Each one advertises a +default route for the address family it's taking care of. If one goes down, +the interface will fail over and everything reconverges rather quickly. IPv6 +connections are unaffected, as the routing is stateless and traffic continues +to flow normally. IPv4 connections may get interrupted as the NAT state is +lost. + +#### Access Routers + +The interfaces facing the client machines provide connectivity for both IPv4 +and IPv6. + +The IPv6 configuration is much simpler. FRR can be configured to send router +advertisements to the subnets. Both routers are configured to advertise their +presence, as well as the subnet prefixes and DNS information. Client machines +will pick these up, and then have both routers as their default gateways. + +For IPv4, I need to run VRRP to share a `".1"` address on the subnet that DHCP +advertises as the default gateway. This works, however much less elegantly +than the configuration with IPv6. + +## It Works + +After I got this all in place, it was finally possible to build myself a +working Kubernetes cluster and migrate all of my old services over to it. +I'll get into that adventure in the next series of articles. diff --git a/content/posts/new-homenet-architecture/second_new_net.svg b/content/posts/new-homenet-architecture/second_new_net.svg new file mode 100644 index 0000000..dfca4cf --- /dev/null +++ b/content/posts/new-homenet-architecture/second_new_net.svg @@ -0,0 +1,4 @@ + + + +
Internet
Kubernetes Node
Edge Routers
Access Routers
Access Switch
AP Switch
Access Points
Kubernetes Node
Kubernetes Node
Fabric Switch A
Fabric Switch B
\ No newline at end of file