2025-11-20 09:53:21 -07:00
5 changed files with 253 additions and 0 deletions
--- a/content/posts/new-homenet-architecture/featured.jpg
+++ b/content/posts/new-homenet-architecture/featured.jpg
--- a/content/posts/new-homenet-architecture/first_net.svg
+++ b/content/posts/new-homenet-architecture/first_net.svg
--- a/content/posts/new-homenet-architecture/first_new_net.svg
+++ b/content/posts/new-homenet-architecture/first_new_net.svg
--- a/content/posts/new-homenet-architecture/index.md
+++ b/content/posts/new-homenet-architecture/index.md
@@ -0,0 +1,241 @@
+---
+title: "New Home Network Architecture"
+date: "2025-11-20T07:30:00-07:00"
+description: "The architecture of my current home network"
+summary: "I decided to rearchitect my home network, and this is the result."
+---
+
+Almost two years ago, I decided to undertake another major overhaul of my home
+network. It is the latest step in the long evolution of my personal systems,
+and is now in a state that I'm fairly happy with.
+
+## History
+
+I've been networking my computers together for decades. The first example that
+could reasonably be called a network was around 1996, where I connected my
+Amiga 1200 and Amiga 3000T together over a null-modem serial cable and ran IP
+between them. I eventually got some Ethernet hardware and ran mostly simple
+flat networks for a while.
+
+### Orthodoxy
+
+After working professionally designing and building networks and IT systems, I
+had learned a few rules. Networks in particular always consisted of several
+key elements:
+
+- **Three Tiers**. You needed Core, Distribution, and Access switches. This
+  helps to scale the network and keep things well balanced.
+- **VLANs**. Every packet needs to go through a VLAN. VLANs keep the network
+  segregated for security, and allow for smaller broadcast domains.
+- **Firewalls**. The network has to protect the vulnerable endpoints by
+  blocking packets that aren't permitted by policy.
+- **Virtualization**. Virtualize everything to decouple the systems from the
+  underlying infrastructure, keeping them portable.
+
+Naturally, I took these ideas home and built my personal networks accordingly.
+
+### Something's not right
+
+Eventually, I had built myself a network that looked something like the
+diagram below. I kept to the principles I was familiar with, and this was the
+result.
+
+![First Network Diagram](first_net.svg 'My last "orthodox" network architecture')
+
+I had VLANs for everything coming from the VM hosts to the physical switches.
+Traffic would loop through the tiers (in, out, and back in), routing
+everything like it's supposed to, but this redundancy introduced unecessary
+complexity. I had more VMs acting as routers than I had VMs doing productive
+activity.
+
+When I decided that I would like to start using IPv6 in my network, everything
+doubled. I kept the IPv4 and IPv6 traffic on separate VLANs, and had separate
+routers for everything, doubling what's in that diagram. It didn't take me
+long to notice that this wasn't working out, and started to think about a new
+approach.
+
+## New Design
+
+When I started to think about what I really needed in my home network, I
+came up with several principles:
+
+- **IPv6 First**. Using IPv6 makes so many things simpler. No NAT. Subnets
+  don't run out of addresses. Link-local addressing that works, and is useful.
+  No DHCP. *No NAT!*
+- **Zero Trust**. I'm a fan of zero-trust architectures. When you place the
+  responsibility for security on the network, it tends to get complicated. You
+  make physical design choices around isolating packets on the right segments
+  and getting them to the right firewalls, where the network should focus on
+  moving traffic through it as quickly as possible. This principle of simply
+  getting packets to where they need to be is how the Internet scales so well.
+  We can keep physical LANs simple and efficient like this, and leave the
+  security concerns to the endpoints. The endpoints need to be secure anyways,
+  and we now have more modern and effective tools to help us.
+- **Network Fabric**. Rather than redoing the same 3-tier model I was used to,
+  I wanted to do something more efficient. I was inspired by an article
+  written at Facebook about their
+  ["data center fabric"](https://engineering.fb.com/2014/11/14/production-engineering/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/)
+  architecture. This is obviously much larger than what anyone needs in a home
+  network, but I thought that these were good ideas that I could use.
+- **Routing Protocols**. I've traditionally used [OSPF](https://en.wikipedia.org/wiki/Open_Shortest_Path_First)
+  in the networks I've operated. When I decided to implement IPv6 in the
+  network, I was using [Quagga](https://www.nongnu.org/quagga/) for the
+  routing software. It doesn't support OSPF areas in OSPFv3 for IPv6, which
+  made me reconsider. I settled on [IS-IS](https://en.wikipedia.org/wiki/IS-IS),
+  as it supports both IPv4 and IPv6 at the same time, and could do everything
+  that I needed it to do.
+
+### Refresh, Version 1
+
+My first refreshed network design looked like this:
+
+![New Network Diagram, Version 1](first_new_net.svg 'My first updated network architecture')
+
+I did away with all of the traditional complexity, and established two network
+"fabrics" that all of the traffic would pass through. The fabrics do not
+connect directly at layer 2, each side is separate. Each fabric is a single
+flat network, there are no VLANs.
+
+These were the key design decisions:
+
+- **Routing over switching**. Every physical device connecting into the fabric
+  switches would conform to a basic set of requirements:
+  1. It will act as a router, running IS-IS to communicate with every other
+     device on the fabric switches.
+  2. Each endpoint uses a single loopback address as its network identity. It
+     advertises this address to the other nodes, as well as the subnets that
+     it can route to.
+  3. Routes are advertised over both fabrics, enabling [ECMP](https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing)
+     for higher availability and bandwidth.
+- **IPv6 first**. The access routers and Wifi routers only had IPv6 subnets
+  available for client devices. This allowed me to do away with DHCP services
+  on the network, only using [SLAAC](https://en.wikipedia.org/wiki/IPv6_address#Stateless_address_autoconfiguration_(SLAAC)).
+  Access to IPv4-only resources was through the use of
+  [DNS64 and the NAT64 gateway](https://en.wikipedia.org/wiki/IPv6_transition_mechanism#NAT64).
+
+## Next Generation
+
+At this point, I was fairly happy with the result. The network was efficient
+and much easier to maintain. It was faster, thanks to ECMP and having fewer
+hops. As I was using it however, I started to think about the next set of
+improvements.
+
+- **Single point routers**. I had only single devices acting as my edge,
+  access, and Wifi routers. I wanted some redundancy in case one failed, and
+  to make maintenance more transparent with the ability to fail over.
+- **Virtual Machines**. Most of my workloads were set up as virtual machines.
+  I wanted to migrate to [Kubernetes](https://kubernetes.io/) as everything
+  I was running could be run there, along with many other benefits.
+- **NAT64**. Here I was running IPv6 to get away from needing NAT, but I still
+  needed NAT. This setup was mostly working fine, but there were a few small
+  irritations:
+  - There are not very many NAT64 implementations. I was using [JooL](https://jool.mx/);
+    it's a non-standard Linux kernel module, and it's not really actively
+    developed anymore.
+  - The path from the NAT64 gateway out the Edge router is still IPv4, and I
+    still need to do NAT for IPv4 at the edge.
+  - Applications connecting directly to an IPv4 address weren't able to do so.
+    I could use [464XLAT](https://en.wikipedia.org/wiki/IPv6_transition_mechanism#464XLAT)
+    on endpoints that supported it, but it's yet another thing to set up.
+  - There's the occasional device that still doesn't support IPv6, or doesn't
+    support it properly.
+- **BGP**. I was purely using IS-IS throughout the network, but Kubernetes
+  CNIs that work on bare metal systems like mine rely on BGP to advertise
+  routes into the network. I'd have to work out how to incorporate this.
+- **Easier WiFi**. I was using a WiFi router running [OpenWRT](https://openwrt.org/),
+  connecting to both fabrics and running IS-IS just like everything else.
+  OpenWRT is great, but it is challenging to keep devices up-to-date.
+- **Load Balancing**. I didn't have any solution for establishing network
+  load balancing for scale and availability.
+
+### Refresh, Version 2
+
+Incorporating the improvements I wanted to make, here is the resulting network
+architecture:
+
+![New Network Diagram, Version 2](second_new_net.svg 'My current network architecture')
+
+The key changes are:
+
+- **Redundant Routers**. I doubled up the edge and access routers. They can
+  effectively divide the load and fail over when needed.
+- **Anycast for Load Balancing**. I've standardized on making use of
+  [Anycast](https://en.wikipedia.org/wiki/Anycast) addressing for creating
+  load balanced and redundant network services. I'm using this a few ways:
+  - The API server for my Kubernetes cluster is on an anycast address. This
+    address is advertised from the three control plane nodes.
+  - Kubernetes `LoadBalancer` type services allocate an address from a pool
+    and advertise it out from any node that can accept traffic for the
+    service.
+  - My recursive DNS servers providing DNS lookups for the network are on two
+    anycast addresses. Each edge router runs an instance and advertises one of
+    the addresses; this is so I can "bootstrap" the network from the edge
+    routers. I also run the DNS service under Kubernetes, which advertises the
+    same anycast addresses using ordinary `LoadBalancer` services.
+- **IS-IS and BGP**. I took a few passes at getting this right. I first tried
+  to move fully from IS-IS to BGP only. This meant setting up peering
+  using IPv6 link local addresses, which worked, but it was a bit flaky under
+  [FRR](https://frrouting.org/). I settled on using IS-IS on the fabric
+  interfaces only to exchange the IPv6 loopback addresses of each node. I use
+  the globally routable loopback addresses for the BGP peering, which is much
+  easier in practice. All of the other routes (access subnets, Kubernetes
+  networks, anycast addresses, defaults from the edge routers) are exchanged
+  using BGP.
+- **No NAT64**. I decided to do away with NAT64 and provide dual-stack
+  connectivity to the access networks. I set up [Kea](https://www.isc.org/kea/)
+  as a cluster on the two access routers, which is thankfully rather low
+  maintenance.
+- **BGP Extended-Nexthop**. An added bonus to using BGP the way that I am is
+  that I could make use of the [BGP extended-nexthop](https://datatracker.ietf.org/doc/html/rfc8950)
+  capability. The old network with only IS-IS still required me to define IPv4
+  subnets on the switching fabrics, nodes used IPv4 addresses as the next hop
+  gateway addresses for IPv4 routes. With the extended-nexthop capability in
+  BGP, it uses the IPv6 link-local addresses for the next hop under both IPv4
+  and IPv6.
+
+### High Availability
+
+To migrate from single routers to redundant pairs, I needed to figure out a
+few things.
+
+#### Default Routes
+
+With a single edge router, this was easy. With two, it's a bit of a puzzle. My
+ISP doesn't actually provide fully routed IPv6 connectivity with my class of
+service. I do get static IPv4 addresses, however. I've been using Hurricane
+Electric's [tunnel broker](https://tunnelbroker.net/) service to get a routed
+`/48` IPv6 subnet.
+
+With a pair of edge routers, I've set them up with four static IPv4 addresses
+on their Internet-facing interfaces. Each router gets one address. I then have
+two [VRRP](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol)
+interfaces, one that I use to terminate the IPv6 tunnel, and the other I use
+for all IPv4 traffic. When both routers are up and running, one will have the
+IPv6 tunnel and the other will have the IPv4 interface. Each one advertises a
+default route for the address family it's taking care of. If one goes down,
+the interface will fail over and everything reconverges rather quickly. IPv6
+connections are unaffected, as the routing is stateless and traffic continues
+to flow normally. IPv4 connections may get interrupted as the NAT state is
+lost.
+
+#### Access Routers
+
+The interfaces facing the client machines provide connectivity for both IPv4
+and IPv6.
+
+The IPv6 configuration is much simpler. FRR can be configured to send router
+advertisements to the subnets. Both routers are configured to advertise their
+presence, as well as the subnet prefixes and DNS information. Client machines
+will pick these up, and then have both routers as their default gateways.
+
+While IPv6 configuration is seamless, IPv4 relies on VRRP to share a `".1"`
+default gateway address, which, though functional, lacks the elegance of
+IPv6's stateless design.
+
+## It Works
+
+After I got this all in place, it was finally possible to build myself a
+working Kubernetes cluster and migrate all of my old services over to it. The
+transition to Kubernetes not only streamlined service management but also laid
+the foundation for future scalability and automation. I'll get into that
+adventure in the next series of articles.
--- a/content/posts/new-homenet-architecture/second_new_net.svg
+++ b/content/posts/new-homenet-architecture/second_new_net.svg