Publish "New Home Network Architecture" #1
BIN
content/posts/new-homenet-architecture/featured.jpg
Executable file
BIN
content/posts/new-homenet-architecture/featured.jpg
Executable file
Binary file not shown.
|
After Width: | Height: | Size: 6.1 MiB |
4
content/posts/new-homenet-architecture/first_net.svg
Normal file
4
content/posts/new-homenet-architecture/first_net.svg
Normal file
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 349 KiB |
4
content/posts/new-homenet-architecture/first_new_net.svg
Normal file
4
content/posts/new-homenet-architecture/first_new_net.svg
Normal file
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 307 KiB |
241
content/posts/new-homenet-architecture/index.md
Normal file
241
content/posts/new-homenet-architecture/index.md
Normal file
@@ -0,0 +1,241 @@
|
||||
---
|
||||
title: "New Home Network Architecture"
|
||||
date: "2025-11-20T07:30:00-07:00"
|
||||
description: "The architecture of my current home network"
|
||||
summary: "I decided to rearchitect my home network, and this is the result."
|
||||
---
|
||||
|
||||
Almost two years ago, I decided to undertake another major overhaul of my home
|
||||
network. It is the latest step in the long evolution of my personal systems,
|
||||
and is now in a state that I'm fairly happy with.
|
||||
|
||||
## History
|
||||
|
||||
I've been networking my computers together for decades. The first example that
|
||||
could reasonably be called a network was around 1996, where I connected my
|
||||
Amiga 1200 and Amiga 3000T together over a null-modem serial cable and ran IP
|
||||
between them. I eventually got some Ethernet hardware and ran mostly simple
|
||||
flat networks for a while.
|
||||
|
||||
### Orthodoxy
|
||||
|
||||
After working professionally designing and building networks and IT systems, I
|
||||
had learned a few rules. Networks in particular always consisted of several
|
||||
key elements:
|
||||
|
||||
- **Three Tiers**. You needed Core, Distribution, and Access switches. This
|
||||
helps to scale the network and keep things well balanced.
|
||||
- **VLANs**. Every packet needs to go through a VLAN. VLANs keep the network
|
||||
segregated for security, and allow for smaller broadcast domains.
|
||||
- **Firewalls**. The network has to protect the vulnerable endpoints by
|
||||
blocking packets that aren't permitted by policy.
|
||||
- **Virtualization**. Virtualize everything to decouple the systems from the
|
||||
underlying infrastructure, keeping them portable.
|
||||
|
||||
Naturally, I took these ideas home and built my personal networks accordingly.
|
||||
|
||||
### Something's not right
|
||||
|
||||
Eventually, I had built myself a network that looked something like the
|
||||
diagram below. I kept to the principles I was familiar with, and this was the
|
||||
result.
|
||||
|
||||

|
||||
|
||||
I had VLANs for everything coming from the VM hosts to the physical switches.
|
||||
Traffic would loop through the tiers (in, out, and back in), routing
|
||||
everything like it's supposed to, but this redundancy introduced unecessary
|
||||
complexity. I had more VMs acting as routers than I had VMs doing productive
|
||||
activity.
|
||||
|
||||
When I decided that I would like to start using IPv6 in my network, everything
|
||||
doubled. I kept the IPv4 and IPv6 traffic on separate VLANs, and had separate
|
||||
routers for everything, doubling what's in that diagram. It didn't take me
|
||||
long to notice that this wasn't working out, and started to think about a new
|
||||
approach.
|
||||
|
||||
## New Design
|
||||
|
||||
When I started to think about what I really needed in my home network, I
|
||||
came up with several principles:
|
||||
|
||||
- **IPv6 First**. Using IPv6 makes so many things simpler. No NAT. Subnets
|
||||
don't run out of addresses. Link-local addressing that works, and is useful.
|
||||
No DHCP. *No NAT!*
|
||||
- **Zero Trust**. I'm a fan of zero-trust architectures. When you place the
|
||||
responsibility for security on the network, it tends to get complicated. You
|
||||
make physical design choices around isolating packets on the right segments
|
||||
and getting them to the right firewalls, where the network should focus on
|
||||
moving traffic through it as quickly as possible. This principle of simply
|
||||
getting packets to where they need to be is how the Internet scales so well.
|
||||
We can keep physical LANs simple and efficient like this, and leave the
|
||||
security concerns to the endpoints. The endpoints need to be secure anyways,
|
||||
and we now have more modern and effective tools to help us.
|
||||
- **Network Fabric**. Rather than redoing the same 3-tier model I was used to,
|
||||
I wanted to do something more efficient. I was inspired by an article
|
||||
written at Facebook about their
|
||||
["data center fabric"](https://engineering.fb.com/2014/11/14/production-engineering/introducing-data-center-fabric-the-next-generation-facebook-data-center-network/)
|
||||
architecture. This is obviously much larger than what anyone needs in a home
|
||||
network, but I thought that these were good ideas that I could use.
|
||||
- **Routing Protocols**. I've traditionally used [OSPF](https://en.wikipedia.org/wiki/Open_Shortest_Path_First)
|
||||
in the networks I've operated. When I decided to implement IPv6 in the
|
||||
network, I was using [Quagga](https://www.nongnu.org/quagga/) for the
|
||||
routing software. It doesn't support OSPF areas in OSPFv3 for IPv6, which
|
||||
made me reconsider. I settled on [IS-IS](https://en.wikipedia.org/wiki/IS-IS),
|
||||
as it supports both IPv4 and IPv6 at the same time, and could do everything
|
||||
that I needed it to do.
|
||||
|
||||
### Refresh, Version 1
|
||||
|
||||
My first refreshed network design looked like this:
|
||||
|
||||

|
||||
|
||||
I did away with all of the traditional complexity, and established two network
|
||||
"fabrics" that all of the traffic would pass through. The fabrics do not
|
||||
connect directly at layer 2, each side is separate. Each fabric is a single
|
||||
flat network, there are no VLANs.
|
||||
|
||||
These were the key design decisions:
|
||||
|
||||
- **Routing over switching**. Every physical device connecting into the fabric
|
||||
switches would conform to a basic set of requirements:
|
||||
1. It will act as a router, running IS-IS to communicate with every other
|
||||
device on the fabric switches.
|
||||
2. Each endpoint uses a single loopback address as its network identity. It
|
||||
advertises this address to the other nodes, as well as the subnets that
|
||||
it can route to.
|
||||
3. Routes are advertised over both fabrics, enabling [ECMP](https://en.wikipedia.org/wiki/Equal-cost_multi-path_routing)
|
||||
for higher availability and bandwidth.
|
||||
- **IPv6 first**. The access routers and Wifi routers only had IPv6 subnets
|
||||
available for client devices. This allowed me to do away with DHCP services
|
||||
on the network, only using [SLAAC](https://en.wikipedia.org/wiki/IPv6_address#Stateless_address_autoconfiguration_(SLAAC)).
|
||||
Access to IPv4-only resources was through the use of
|
||||
[DNS64 and the NAT64 gateway](https://en.wikipedia.org/wiki/IPv6_transition_mechanism#NAT64).
|
||||
|
||||
## Next Generation
|
||||
|
||||
At this point, I was fairly happy with the result. The network was efficient
|
||||
and much easier to maintain. It was faster, thanks to ECMP and having fewer
|
||||
hops. As I was using it however, I started to think about the next set of
|
||||
improvements.
|
||||
|
||||
- **Single point routers**. I had only single devices acting as my edge,
|
||||
access, and Wifi routers. I wanted some redundancy in case one failed, and
|
||||
to make maintenance more transparent with the ability to fail over.
|
||||
- **Virtual Machines**. Most of my workloads were set up as virtual machines.
|
||||
I wanted to migrate to [Kubernetes](https://kubernetes.io/) as everything
|
||||
I was running could be run there, along with many other benefits.
|
||||
- **NAT64**. Here I was running IPv6 to get away from needing NAT, but I still
|
||||
needed NAT. This setup was mostly working fine, but there were a few small
|
||||
irritations:
|
||||
- There are not very many NAT64 implementations. I was using [JooL](https://jool.mx/);
|
||||
it's a non-standard Linux kernel module, and it's not really actively
|
||||
developed anymore.
|
||||
- The path from the NAT64 gateway out the Edge router is still IPv4, and I
|
||||
still need to do NAT for IPv4 at the edge.
|
||||
- Applications connecting directly to an IPv4 address weren't able to do so.
|
||||
I could use [464XLAT](https://en.wikipedia.org/wiki/IPv6_transition_mechanism#464XLAT)
|
||||
on endpoints that supported it, but it's yet another thing to set up.
|
||||
- There's the occasional device that still doesn't support IPv6, or doesn't
|
||||
support it properly.
|
||||
- **BGP**. I was purely using IS-IS throughout the network, but Kubernetes
|
||||
CNIs that work on bare metal systems like mine rely on BGP to advertise
|
||||
routes into the network. I'd have to work out how to incorporate this.
|
||||
- **Easier WiFi**. I was using a WiFi router running [OpenWRT](https://openwrt.org/),
|
||||
connecting to both fabrics and running IS-IS just like everything else.
|
||||
OpenWRT is great, but it is challenging to keep devices up-to-date.
|
||||
- **Load Balancing**. I didn't have any solution for establishing network
|
||||
load balancing for scale and availability.
|
||||
|
||||
### Refresh, Version 2
|
||||
|
||||
Incorporating the improvements I wanted to make, here is the resulting network
|
||||
architecture:
|
||||
|
||||

|
||||
|
||||
The key changes are:
|
||||
|
||||
- **Redundant Routers**. I doubled up the edge and access routers. They can
|
||||
effectively divide the load and fail over when needed.
|
||||
- **Anycast for Load Balancing**. I've standardized on making use of
|
||||
[Anycast](https://en.wikipedia.org/wiki/Anycast) addressing for creating
|
||||
load balanced and redundant network services. I'm using this a few ways:
|
||||
- The API server for my Kubernetes cluster is on an anycast address. This
|
||||
address is advertised from the three control plane nodes.
|
||||
- Kubernetes `LoadBalancer` type services allocate an address from a pool
|
||||
and advertise it out from any node that can accept traffic for the
|
||||
service.
|
||||
- My recursive DNS servers providing DNS lookups for the network are on two
|
||||
anycast addresses. Each edge router runs an instance and advertises one of
|
||||
the addresses; this is so I can "bootstrap" the network from the edge
|
||||
routers. I also run the DNS service under Kubernetes, which advertises the
|
||||
same anycast addresses using ordinary `LoadBalancer` services.
|
||||
- **IS-IS and BGP**. I took a few passes at getting this right. I first tried
|
||||
to move fully from IS-IS to BGP only. This meant setting up peering
|
||||
using IPv6 link local addresses, which worked, but it was a bit flaky under
|
||||
[FRR](https://frrouting.org/). I settled on using IS-IS on the fabric
|
||||
interfaces only to exchange the IPv6 loopback addresses of each node. I use
|
||||
the globally routable loopback addresses for the BGP peering, which is much
|
||||
easier in practice. All of the other routes (access subnets, Kubernetes
|
||||
networks, anycast addresses, defaults from the edge routers) are exchanged
|
||||
using BGP.
|
||||
- **No NAT64**. I decided to do away with NAT64 and provide dual-stack
|
||||
connectivity to the access networks. I set up [Kea](https://www.isc.org/kea/)
|
||||
as a cluster on the two access routers, which is thankfully rather low
|
||||
maintenance.
|
||||
- **BGP Extended-Nexthop**. An added bonus to using BGP the way that I am is
|
||||
that I could make use of the [BGP extended-nexthop](https://datatracker.ietf.org/doc/html/rfc8950)
|
||||
capability. The old network with only IS-IS still required me to define IPv4
|
||||
subnets on the switching fabrics, nodes used IPv4 addresses as the next hop
|
||||
gateway addresses for IPv4 routes. With the extended-nexthop capability in
|
||||
BGP, it uses the IPv6 link-local addresses for the next hop under both IPv4
|
||||
and IPv6.
|
||||
|
||||
### High Availability
|
||||
|
||||
To migrate from single routers to redundant pairs, I needed to figure out a
|
||||
few things.
|
||||
|
||||
#### Default Routes
|
||||
|
||||
With a single edge router, this was easy. With two, it's a bit of a puzzle. My
|
||||
ISP doesn't actually provide fully routed IPv6 connectivity with my class of
|
||||
service. I do get static IPv4 addresses, however. I've been using Hurricane
|
||||
Electric's [tunnel broker](https://tunnelbroker.net/) service to get a routed
|
||||
`/48` IPv6 subnet.
|
||||
|
||||
With a pair of edge routers, I've set them up with four static IPv4 addresses
|
||||
on their Internet-facing interfaces. Each router gets one address. I then have
|
||||
two [VRRP](https://en.wikipedia.org/wiki/Virtual_Router_Redundancy_Protocol)
|
||||
interfaces, one that I use to terminate the IPv6 tunnel, and the other I use
|
||||
for all IPv4 traffic. When both routers are up and running, one will have the
|
||||
IPv6 tunnel and the other will have the IPv4 interface. Each one advertises a
|
||||
default route for the address family it's taking care of. If one goes down,
|
||||
the interface will fail over and everything reconverges rather quickly. IPv6
|
||||
connections are unaffected, as the routing is stateless and traffic continues
|
||||
to flow normally. IPv4 connections may get interrupted as the NAT state is
|
||||
lost.
|
||||
|
||||
#### Access Routers
|
||||
|
||||
The interfaces facing the client machines provide connectivity for both IPv4
|
||||
and IPv6.
|
||||
|
||||
The IPv6 configuration is much simpler. FRR can be configured to send router
|
||||
advertisements to the subnets. Both routers are configured to advertise their
|
||||
presence, as well as the subnet prefixes and DNS information. Client machines
|
||||
will pick these up, and then have both routers as their default gateways.
|
||||
|
||||
While IPv6 configuration is seamless, IPv4 relies on VRRP to share a `".1"`
|
||||
default gateway address, which, though functional, lacks the elegance of
|
||||
IPv6's stateless design.
|
||||
|
||||
## It Works
|
||||
|
||||
After I got this all in place, it was finally possible to build myself a
|
||||
working Kubernetes cluster and migrate all of my old services over to it. The
|
||||
transition to Kubernetes not only streamlined service management but also laid
|
||||
the foundation for future scalability and automation. I'll get into that
|
||||
adventure in the next series of articles.
|
||||
File diff suppressed because one or more lines are too long
|
After Width: | Height: | Size: 234 KiB |
Reference in New Issue
Block a user