Differences between revisions 3 and 35 (spanning 32 versions)

persica cluster

This is a cluster of three identical nodes, named persica1/2/3

Alma Linux 9.4 x64
Dell Optiplex 9020 Micro
- Intel Core i5-4590T @ 2.00 GHz
- 16gb DDR3-1600
- 128gb SSD SK hynix SC311 SATA M.2
- I bought the three of them for $405 in total, so $135 AUD each, in March 2023.

I also later picked up an extra node to run as the controller, kalina, and whatever other services it might need. The controller is fairly heavyweight, so the Raspberry Pi won't cut it, and I don't want to jam it on the machine hosting my webserver at the same time.

Alma Linux 9.4 x64
Lenovo ThinkCentre M710q
- Intel Core i5-6500T @ 2.50GHz (4-core no HT)
- 16gb DDR4-2133 (DIMMs specced for 2667)
- 256gb SSD SK hynix PC601 NVMe M.2
I bought this one for $159 in November 2023

I've also added a dedicated Mikrotik hEX S router (model RB760iGS) to the setup, it gives a dedicated /24 subnet to the cluster, routed to the rest of my LAN but without using NAT. Now I get to learn OSPF, and BGP once I add MetalLB to the cluster for ingress.

This has been such a chore getting things working smoothly, it's just so damn finicky and it makes my notes a mess. I've tried to clean them up, but I'll keep all the failure notes in a section at the end.

Doing it mostly raw with kubeadm sucked, the docs are completely unopinionated and give you every option at every instance of there being a choice. Great if you know what you're doing, but if you know what you're doing then you don't really need those docs
This guide looks like an improvement, something with actual explanations and a few opinions: https://github.com/hobby-kube/guide
So many guides assume you're doing this in the cloud, which is a fair assumption for starting as a beginner with no infra, but they make too many logical leaps that you have to fill in the gaps yourselves, or just can't be applied on your own baremetal

Contents

persica cluster

Intro

So here's the gist of this setup, third or fourth attempt now:

I'm going to use Rancher for the controlplane
Alma 9.4 because it's the latest
Move it a new subnet of 192.168.3.0/26 and put that behind the new Mikrotik router, helena. This means DHCP stays within the cluster, though the PXE service host is still outside.
- persica1 / 192.168.3.3
- persica2 / 192.168.3.4
- persica3 / 192.168.3.5
- kalina / 192.168.3.2
- persica / CNAME to kalina for the Rancher web interface
Try using Longhorn for PVCs, though Portworx could be on the cards as well. At least I understand it now
Will try using MetalLB for non-http ingress

Hardware prep for the cluster nodes

Setup each new node like so, it's stuff that we just need to do one time when we receive the hardware:

k8s nodes: servers/HardwarePrep/DellOptiplex9020Micro
controller: servers/HardwarePrep/LenovoThinkCentreM710q

Prepare azusa for PXE services

This is needed so we can build kalina and the persica nodes consistently and easily. It can be used for other systems on the LAN as well, it's not just for this cluster.

Build servers/azusa as the network services node, directions on how to configure these components are on her page.

Client netboots in UEFI mode and performs DHCP to get an IP address and PXE options
helena (router) points to azusa as the PXE boot next-server
azusa serves grubx64.efi as the EFI bootloader, via its TFTP server
grub reads grug.cfg and fetches menu entries specific to the client, based on its MAC address, also via TFTP
The client boots the kickstart installer target, fetching vmlinuz and initrd.img from azusa via TFTP
Kickstart begins thanks to kernel cmdline options, fetching the kickstart config from azusa, now via HTTP

ansible management for the cluster

azusa will also host the ansible repo for managing the cluster.

Once a node is built with kickstart and online, we'll run an ansible playbook against it to get it up to spec. Make minimal assumptions about the kickstart part of the process, let ansible do the rest.

Login as myself, furinkan
Repo for the cluster is in ~/git/ansible/

Valid targets are simple:

make kalina   # just the controller
make persica  # controller and k8s nodes

Have nice SSH config so azusa can connect to each k8s node easily

Make yourself a little config in ~/.ssh/config

Host *
User root
IdentityFile ~/git/ansible/sshkey_ed25519

Prepare kalina controller node

Now build kalina:

Kickstart-build kalina using the configs on azusa
Run ansible against kalina, this will configure the OS and install docker.
Check that docker works
```
docker run hello-world
```
Push the certs from illustrious to kalina, we're using real publicly trusted CA-signed certs: https://ranchermanager.docs.rancher.com/pages-for-subheaders/rancher-on-a-single-node-with-docker#option-c-bring-your-own-certificate-signed-by-a-recognized-ca
- On illustrious:
```
cd /etc/ssl/

rsync -avx \
  STAR_thighhighs_top.key \
  STAR_thighhighs_top.crtbundled \
  STAR_thighhighs_top.key.2023 \
  STAR_thighhighs_top.crtbundled.2023 \
  root@kalina:/etc/ssl/
```
- Then on kalina:
```
chown root:root /etc/ssl/STAR_thighhighs_top.*
```

Run Rancher on kalina

If you're doing this on an ARM system follow this guide, it just tells you to specify an exact version so you know it's built with arm64 support: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64

Well I'm on x86 now so that doesn't matter, but I'm still going to specify an exact version because I'm sensible and want a repeatable build with no surprises.

docker run -d --restart=unless-stopped \
  -p 443:443 \
  -v /etc/ssl/STAR_thighhighs_top.crtbundled:/etc/rancher/ssl/cert.pem \
  -v /etc/ssl/STAR_thighhighs_top.key:/etc/rancher/ssl/key.pem \
  --privileged \
  rancher/rancher:v2.6.6 \
  --no-cacerts

It'll take some time to start. Then you can try hitting the Rancher web UI: https://kalina.thighhighs.top/

Login with the local user password as directed, then let it set the new admin password. Record it somewhere safe, and set the server URL to https://persica.thighhighs.top because that's how we're going to access the cluster once we're done.

Build the k8s nodes

Manually kick the BIOS of each node to do a one-time PXE boot (mash F12 during POST), then let it do its thing.

Ansible-ise the k8s nodes

On azusa, run ansible against the hosts to configure the OS and install docker.

make persica

Stand up the cluster

We're following these instructions: https://ranchermanager.docs.rancher.com/pages-for-subheaders/use-existing-nodes

From the Dashboard click the Create button
Select Use existing nodes and create a cluster using RKE
Fill in the details
- cluster name: persica
- leave most options as default
- I'm picking k8s version v1.20.15-rancher2-2 so it matches what we run at work, and I can test upgrades at home
- set the docker root directory to /persist/docker because we're moving to a disk with plenty of space, separate to the OS
- Allow unsupported versions of Docker is already enabled; we need this because we're using a much newer distro and docker version
- Hit Next to go to the next page
Check the boxes for all three cluster roles, all nodes will perform all roles

Go ahead and run the supplied command on each node. I like to do it one at a time so I can watch it

docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.6.6 --server https://persica.thighhighs.top --token lx5qjbl4dn7zkpbmt5qqz8qfdvtgsl2x5ft95j8lh785bxrjjccq2t --etcd --controlplane --worker

docker logs recursing_proskuriakova -f

Give it like 10min, eventually the containers logs that you're following will die, because the container terminates once all the k8s components are up and running.

Install kubectl on controller kalina

This friggen sucks for older version, no package management for you!

https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux

Once you've got it installed, go to Rancher and Explore the persica cluster (https://persica.thighhighs.top/dashboard/c/c-gfnh7/explorer#cluster-events), then copy the kubeconfig to your clipboard with the button in the toolbar at the top of the screen.

Go paste that into ~/.kube/config in your account on kalina, now you can run kubectl there!

Add this to your ~/.bashrc for cool tab-completion:

if hash kubectl 2>/dev/null ; then
    source <(kubectl completion bash)
fi

Install Longhorn cluster storage manager

This is done from the builtin Helm charts, let it go to work. It's a couple of simple clicks: https://persica.thighhighs.top/dashboard/c/c-gfnh7/apps/charts?category=Storage

For some reason the predefined things you can configure on the helm chart don't include the local path to the disk on each node. Which is pretty bloody obvious you'd think, but no. It'll default to /var/lib/longhorn or something unless you override it.

Install into the System project
Do customise helm options before install
Go to the Edit YAML page and change the defaultDataPath to /persist/longhorn/ instead
Now you can run the install

I tried out this dude's demo app that uses flask and redis to deploy a trivial website, that was a nifty test of all the bits working together as expected:

Blessedly the ingress just works. No idea what to do yet to make a service that presents itself on public IPs.

Prepare dummy DNS records so we can test ingress and load balancing

Apps need ingress, and ingress means you need hostnames to refer to stuff. Let's add these to our zone:

# Dodgy roundrobin for "load balancing" or ingress connections, which are terminated by a proxy on any node
persicanodes 300 IN A 192.168.3.3
persicanodes 300 IN A 192.168.3.4
persicanodes 300 IN A 192.168.3.5

# Now some unique names for all the apps we're going to try
app1.persica 300 IN CNAME persicanodes
app2.persica 300 IN CNAME persicanodes
app3.persica 300 IN CNAME persicanodes
app4.persica 300 IN CNAME persicanodes
app5.persica 300 IN CNAME persicanodes

# These will be BGP or Layer2 MetalLB IPs
lb1.persica 300 IN A 192.168.3.65
lb2.persica 300 IN A 192.168.3.66
lb3.persica 300 IN A 192.168.3.67
lb4.persica 300 IN A 192.168.3.68
lb5.persica 300 IN A 192.168.3.69

Load balancing with MetalLB

I thought I wouldn't need it, but it looks like I do, if I want sensible useful functionality. Here's an explanation of why I want to use Metal LB, and it's not just for BGP-based configs: https://github.com/kubernetes/ingress-nginx/blob/main/docs/deploy/baremetal.md

Install it:

RTFM: https://metallb.universe.tf/installation/
Grab the manifest and pull it into the repo, I'm using this one as it's similar to work: https://github.com/metallb/metallb/blob/v0.9/manifests/metallb.yaml

Create the namespace first, I'm putting it into the System project:

apiVersion: v1
kind: Namespace
metadata:
  name: metallb-system
  # This is the System project on the prod cluster
  annotations:
    field.cattle.io/projectId: c-gfnh7:p-db8t4
  labels:
    app: metallb

Create the metallb resources: kubectl apply -f 01-metallb.yaml
Create the memberlist secret that the nodes need to communicate: kubectl -n metallb-system create secret generic memberlist --from-literal=secretkey="$$(openssl rand -base64 128)"
Setup the configmap to configure its behaviour, they have a fully documented example here: https://github.com/metallb/metallb/blob/v0.9/manifests/example-config.yaml
Apply the config: kubectl apply -f 02-config.yaml

Configure BGP

https://metallb.universe.tf/configuration/#bgp-configuration

Let's go for an iBGP design here - we both belong to the same private AS, number 64520

On helena:

/routing/bgp/connection/add name=persica1 remote.address=192.168.3.3 as=64520 local.role=ibgp
/routing/bgp/connection/add name=persica2 remote.address=192.168.3.4 as=64520 local.role=ibgp
/routing/bgp/connection/add name=persica3 remote.address=192.168.3.5 as=64520 local.role=ibgp

And in metallb we drop this config in:

data:
  config: |
    peers:
      - peer-address: 192.168.3.1
        peer-asn: 64520
        my-asn: 64520
    address-pools:
      - name: persica-lb
        protocol: bgp
        addresses:
          - 192.168.3.64/26
        avoid-buggy-ips: true
        auto-assign: false
        bgp-advertisements:
          - aggregation-length: 32
            localpref: 100
            communities:
              - no-export
    bgp-communities:
      # "Do not advertise this route to external BGP peers"
      no-export: 65535:65281
      # "Do not advertise this route to any peer"
      no-advertise: 65535:65282

The moment I apply this, helena sees a connection from the persica nodes, awesome.

When we just need to define a loadbalanced service in k8s, and they'll start advertising the address.

With a bit of faffing, it does just that. Had to force it to pick the IP I wanted, it uses .64 initially which I don't want. Our version doesn't respect the request by annotation, but spec.loadbalancerIP works (though it's deprecated).

Try MetalLB in Layer 2 mode first

NB: this is old

I'll use it in L2 mode with ARP/NDP I think. Just need to dedicate a bunch of IPs to it so it can manage the traffic to them.

Holy crap I think I got it working.

We'll use it in L2 mode, no BGP yet
Set aside 192.168.3.64 - 192.168.3.127 for load balanced services
Install it via Rancher helm chart interface, no config

Push a simple address pool and advertisement config

---
# https://metallb.universe.tf/configuration/#layer-2-configuration

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: metallb-pool-1
  namespace: metallb-system
spec:
  addresses:
    - 192.168.3.65-192.168.3.126

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: metallb-pool-1
  namespace: metallb-system
# Not needed because L2Advertisement claims all IPAddressPools by default
spec:
  ipAddressPools:
    - metallb-pool-1

Copy the existing redis service in the example, and add an external access route to it as a secondary service

---
apiVersion: v1
kind: Service
metadata:
  namespace: flask
  name: redis-ext
  labels:
    name: redis
    kubernetes.io/name: "redis"

spec:
  selector:
    app: redis
  ports:
    - name: redis
      protocol: TCP
      port: 6379
  type: LoadBalancer

It's really as simple as adding type: LoadBalancer, then MetalLB selects the next free IP itself and binds it.

Try it in BGP mode next

TBC

Making ingress work - was this for the kubeadm method?

I don't understand this well enough, but I want to use ingress-nginx. Here's a page about it, albeit not using raw kubectl: https://kubernetes.github.io/ingress-nginx/kubectl-plugin/

Maybe this one too: https://medium.com/tektutor/using-nginx-ingress-controller-in-kubernetes-bare-metal-setup-890eb4e7772

Things that suck

cgroups

Alma9 introduces cgroups v2, which weren't a thing on Centos 7. That means you have to deal with them now. They tend to break docker a lot, so just revert back to v1 cgroups.

How it manifests:

For context: fucking cgroups, k3s dies instantly
https://github.com/rancher/rancher/issues/35201#issuecomment-947331154
https://groups.google.com/g/linux.debian.bugs.dist/c/Z-Cc0WmlEGA/m/NB6XGDsnAwAJ
Finally found a simple solution: https://github.com/rancher/rancher/issues/36165

Fix it:

Append an option to the kernel cmdline, this'll do it for you:

grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"

Then reboot for it to take effect

Networking kernel modules

The problem: you fixed cgroups but now you get an error like this when Rancher starts up:

I1125 03:57:50.129406      93 network_policy_controller.go:163] Starting network policy controller
F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
panic: F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.

And then it explodes and the container dies.

Turns out you need some iptables modules loaded. This fixed it the first time:

[root@kalina ~]# modprobe iptable_nat
[root@kalina ~]# modprobe br_netfilter

But it happened again the next time I rebuilt the cluster. You gotta make it stick by adding config fragments to /etc/modules-load.d

Explanations:

This kinda describes the issue: https://slack-archive.rancher.com/t/9761163/hey-folks-i-have-a-quick-question-for-a-newbie-i-have-setup-
Yeah it turns out that the rancher container fucking dies in the arse with no explanation when you don't have the iptables modules loaded, duhhhh. I figured that out and made them load on-boot like so: https://forums.centos.org/viewtopic.php?t=72040

Firewalls

Now whyTF can't persica2 and persica3 contact services on persica1..? Aha, firewalld is running on persica1, and it shouldn't be. Need to disable it using ansible as well.

systemctl disable firewalld.service --now

Yeah that's jank, but hey it's what they tell you to do! https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/open-ports-with-firewalld

"We recommend disabling firewalld. For Kubernetes 1.19.x and higher, firewalld must be turned off."

Cleanup and try again

Find that it doesn't work and you can't make it work, awesome. Tear it all down and start again, killing every container, nuking files, and starting from scratch: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/clean-cluster-nodes#directories-and-files

Eventually, you get a cluster with three working nodes in it!!

Installing older versions of kubectl

Running an older version of k8s and need an older version of kubectl to go with it? You're shit out of luck, my friend!

https://kubernetes.io/blog/2023/08/15/pkgs-k8s-io-introduction/

They moved to new package repos in 2023, and as of early 2024 the old repos are gone! The new repos only have v1.24 and newer, so if you need anything older it's just not there.

Looks like our last option is: "You can directly download binaries instead of using packages. As an example, see Without a package manager instructions in "Installing kubeadm" document.

And you end up here: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux

Here's a modern way of defining the repo on debian-type systems btw:

# Our cluster is k8s v1.23 so we can use kubectl as late as 1.24

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.24/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources
X-Repolib-Name: Kubernetes
Enabled: yes
Types: deb
URIs: https://pkgs.k8s.io/core:/stable:/v1.24/deb/
Suites: /
Architectures: arm64
Signed-By: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
EOF

apt update
apt install -y kubectl

You can't use selinux

It just breaks way too much shit, it's not worth it. Install something new and it doesn't work? You'll forever be wondering "is it selinux" immediately after it fails.

-  ⇤ ← Revision 3 as of 2023-04-04 14:13:52 → 
  Size: 2423
  Editor: furinkan
  Comment: k8s provisioning notes
+   ← Revision 35 as of 2024-05-08 17:23:37 → ⇥
  Size: 21040
  Editor: furinkan
  Comment: get metallb working
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
- * Alma Linux 9.1 x64
+ * Alma Linux 9.4 x64
 Line 9:
-  * 128gb SSD
+  * 128gb SSD SK hynix SC311 SATA M.2
  * I bought the three of them for $405 in total, so $135 AUD each, in March 2023.

I also later picked up an extra node to run as the controller, [[../kalina| kalina]], and whatever other services it might need. The controller is fairly heavyweight, so the Raspberry Pi won't cut it, and I don't want to jam it on the machine hosting my webserver at the same time.
 * Alma Linux 9.4 x64
 * Lenovo !ThinkCentre M710q
  * Intel Core i5-6500T @ 2.50GHz (4-core no HT)
  * 16gb DDR4-2133 (DIMMs specced for 2667)
  * 256gb SSD SK hynix PC601 NVMe M.2
 * I bought this one for $159 in November 2023

I've also added a dedicated Mikrotik hEX S router (model RB760iGS) to the setup, it gives a dedicated /24 subnet to the cluster, routed to the rest of my LAN but '''without using NAT'''. Now I get to learn OSPF, and BGP once I add MetalLB to the cluster for ingress.

{{{#!wiki note
This has been such a chore getting things working smoothly, it's just so damn finicky and it makes my notes a mess. I've tried to clean them up, but I'll keep all the failure notes in a section at the end.

 * Doing it mostly raw with kubeadm sucked, the docs are completely unopinionated and give you every option at every instance of there being a choice. Great if you know what you're doing, but if you know what you're doing then you don't really need those docs
 * This guide looks like an improvement, something with actual explanations and a few opinions: https://github.com/hobby-kube/guide
 * So many guides assume you're doing this in the cloud, which is a fair assumption for starting as a beginner with no infra, but they make too many logical leaps that you have to fill in the gaps yourselves, or just can't be applied on your own baremetal
}}}
-Line 13:
+Line 32:
-== k8s notes ==

 * Make a simple 3-node cluster
 * Single-node control plane will run externally, on illustrious
 * Use kubeadm to build the cluster: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
 * Selected containerd as the container runtime
 * Will use Flannel as the networking plugin
 * Allocated IPs:
  * persica1 / 192.168.1.31
  * persica2 / 192.168.1.32
  * persica3 / 192.168.1.33
 * Ingress: undecided so far
 * Cgroup driver: let's use systemd
 * k8s version: whatever is latest right now (2023-04-04)

== Build notes ==

 * Full UEFI mode
 * PXE boot for kickstart install
 * tftpd-hpa running on illustrious
  * Upstream repo mirror: https://repo.almalinux.org/almalinux/9/BaseOS/x86_64/os/EFI/BOOT/
 * kickstart file served from `/data/www/illustrious/ks`: https://illustrious.thighhighs.top/ks/persica1.ks.cfg
 * KS references:
  * Reference manual: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/performing_an_advanced_rhel_9_installation/kickstart-commands-and-options-reference_installing-rhel-as-an-experienced-user#keyboard-required_kickstart-commands-for-system-configuration
  * Generator tool: https://access.redhat.com/labs/kickstartconfig/

This was useful for figuring out the TFTP stuff for the first time. Paths are hardcoded into the `grubx64.efi` binary, meaning HDD and PXE versions aren't the same. Make sure you put all the grub stuff in a `grub/` directory. Check the `$prefix` to see where it's searching: https://askubuntu.com/questions/1183487/grub2-efi-boot-via-pxe-load-config-file-automatically

I should ansible'ise everything. Can I start with this?
{{{
AlmaLinux 9 - AppStream                                                                                                                                                                          3.0 MB/s | 3.1 kB     00:00
Importing GPG key 0xB86B3716:
 Userid     : "AlmaLinux OS 9 <packager@almalinux.org>"
 Fingerprint: BF18 AC28 7617 8908 D6E7 1267 D36C B86C B86B 3716
 From       : /etc/pki/rpm-gpg/RPM-GPG-KEY-AlmaLinux-9
Is this ok [y/N]: y
Key imported successfully
}}}
+== Intro ==
So here's the gist of this setup, third or fourth attempt now:
 * I'm going to use Rancher for the controlplane
 * Alma 9.4 because it's the latest
 * Move it a new subnet of 192.168.3.0/26 and put that behind the new Mikrotik router, helena. This means DHCP stays within the cluster, though the PXE service host is still outside.
  * persica1 / 192.168.3.3
  * persica2 / 192.168.3.4
  * persica3 / 192.168.3.5
  * kalina / 192.168.3.2
  * persica / CNAME to kalina for the Rancher web interface
 * Try using Longhorn for PVCs, though Portworx could be on the cards as well. At least I understand it now
 * Will try using MetalLB for non-http ingress


== Hardware prep for the cluster nodes ==

Setup each new node like so, it's stuff that we just need to do one time when we receive the hardware:
 * k8s nodes: [[servers/HardwarePrep/DellOptiplex9020Micro]]
 * controller: [[servers/HardwarePrep/LenovoThinkCentreM710q]]


== Prepare azusa for PXE services ==

This is needed so we can build kalina and the persica nodes consistently and easily. It can be used for other systems on the LAN as well, it's not just for this cluster.

Build [[servers/azusa]] as the network services node, directions on how to configure these components are on her page.
 * Client netboots in UEFI mode and performs DHCP to get an IP address and PXE options
 * helena (router) points to azusa as the PXE boot `next-server`
 * azusa serves `grubx64.efi` as the EFI bootloader, via its TFTP server
 * grub reads grug.cfg and fetches menu entries specific to the client, based on its MAC address, also via TFTP
 * The client boots the kickstart installer target, fetching `vmlinuz` and `initrd.img` from azusa via TFTP
 * Kickstart begins thanks to kernel cmdline options, fetching the kickstart config from azusa, now via HTTP

=== ansible management for the cluster ===

azusa will also host the ansible repo for managing the cluster.

Once a node is built with kickstart and online, we'll run an ansible playbook against it to get it up to spec. Make minimal assumptions about the kickstart part of the process, let ansible do the rest.

 * Login as myself, `furinkan`
 * Repo for the cluster is in `~/git/ansible/`
 * Valid targets are simple: {{{
make kalina   # just the controller
make persica  # controller and k8s nodes
}}}

=== Have nice SSH config so azusa can connect to each k8s node easily ===

Make yourself a little config in `~/.ssh/config`
{{{
Host *
User root
IdentityFile ~/git/ansible/sshkey_ed25519
}}}


== Prepare kalina controller node ==

Now build kalina:
 1. Kickstart-build kalina using the configs on azusa
 2. Run ansible against kalina, this will configure the OS and install docker.
 3. Check that docker works {{{
docker run hello-world
}}}
 4. Push the certs from illustrious to kalina, we're using real publicly trusted CA-signed certs: https://ranchermanager.docs.rancher.com/pages-for-subheaders/rancher-on-a-single-node-with-docker#option-c-bring-your-own-certificate-signed-by-a-recognized-ca
  * On illustrious: {{{
cd /etc/ssl/

rsync -avx \
  STAR_thighhighs_top.key \
  STAR_thighhighs_top.crtbundled \
  STAR_thighhighs_top.key.2023 \
  STAR_thighhighs_top.crtbundled.2023 \
  root@kalina:/etc/ssl/
}}}
  * Then on kalina: {{{
chown root:root /etc/ssl/STAR_thighhighs_top.*
}}}

== Run Rancher on kalina ==

If you're doing this on an ARM system follow this guide, it just tells you to specify an exact version so you know it's built with arm64 support: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64

Well I'm on x86 now so that doesn't matter, but I'm still going to specify an exact version because I'm sensible and want a repeatable build with no surprises.
{{{
docker run -d --restart=unless-stopped \
  -p 443:443 \
  -v /etc/ssl/STAR_thighhighs_top.crtbundled:/etc/rancher/ssl/cert.pem \
  -v /etc/ssl/STAR_thighhighs_top.key:/etc/rancher/ssl/key.pem \
  --privileged \
  rancher/rancher:v2.6.6 \
  --no-cacerts
}}}

It'll take some time to start. Then you can try hitting the Rancher web UI: https://kalina.thighhighs.top/

Login with the local user password as directed, then let it set the new admin password. Record it somewhere safe, and set the server URL to https://persica.thighhighs.top because that's how we're going to access the cluster once we're done.


== Build the k8s nodes ==

Manually kick the BIOS of each node to do a one-time PXE boot (mash F12 during POST), then let it do its thing.


== Ansible-ise the k8s nodes ==

On azusa, run ansible against the hosts to configure the OS and install docker.
{{{
make persica
}}}


== Stand up the cluster ==

We're following these instructions: https://ranchermanager.docs.rancher.com/pages-for-subheaders/use-existing-nodes

 1. From the Dashboard click the Create button
 2. Select ''Use existing nodes and create a cluster using RKE''
 3. Fill in the details
  * cluster name: persica
  * leave most options as default
  * I'm picking k8s version `v1.20.15-rancher2-2` so it matches what we run at work, and I can test upgrades at home
  * set the docker root directory to `/persist/docker` because we're moving to a disk with plenty of space, separate to the OS
  * ''Allow unsupported versions'' of Docker is already enabled; we need this because we're using a much newer distro and docker version
  * Hit Next to go to the next page
 4. Check the boxes for all three cluster roles, all nodes will perform all roles
 5. Go ahead and run the supplied command on each node. I like to do it one at a time so I can watch it {{{
docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.6.6 --server https://persica.thighhighs.top --token lx5qjbl4dn7zkpbmt5qqz8qfdvtgsl2x5ft95j8lh785bxrjjccq2t --etcd --controlplane --worker

docker logs recursing_proskuriakova -f
}}}

Give it like 10min, eventually the containers logs that you're following will die, because the container terminates once all the k8s components are up and running.


== Install kubectl on controller kalina ==

This friggen sucks for older version, no package management for you!

https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux

Once you've got it installed, go to Rancher and Explore the persica cluster (https://persica.thighhighs.top/dashboard/c/c-gfnh7/explorer#cluster-events), then copy the kubeconfig to your clipboard with the button in the toolbar at the top of the screen.

Go paste that into `~/.kube/config` in your account on kalina, now you can run `kubectl` there!

Add this to your `~/.bashrc` for cool [[https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#enable-shell-autocompletion| tab-completion]]: {{{
if hash kubectl 2>/dev/null ; then
    source <(kubectl completion bash)
fi
}}}


== Install Longhorn cluster storage manager ==

This is done from the builtin Helm charts, let it go to work. It's a couple of simple clicks: https://persica.thighhighs.top/dashboard/c/c-gfnh7/apps/charts?category=Storage

For some reason the predefined things you can configure on the helm chart '''don't''' include the local path to the disk on each node. Which is pretty bloody obvious you'd think, but no. It'll default to `/var/lib/longhorn` or something unless you override it.

 1. Install into the ''System'' project
 2. Do customise helm options before install
 3. Go to the Edit YAML page and change the `defaultDataPath` to `/persist/longhorn/` instead
 4. Now you can run the install

I tried out this dude's demo app that uses flask and redis to deploy a trivial website, that was a nifty test of all the bits working together as expected:
 * https://ranchergovernment.com/blog/article-simple-rke2-longhorn-and-rancher-install#longhorn-gui
 * https://raw.githubusercontent.com/clemenko/k8s_yaml/master/flask_simple_nginx.yml

Blessedly the ingress just works. No idea what to do yet to make a service that presents itself on public IPs.

== Prepare dummy DNS records so we can test ingress and load balancing ==
Apps need ingress, and ingress means you need hostnames to refer to stuff. Let's add these to our zone:
{{{
# Dodgy roundrobin for "load balancing" or ingress connections, which are terminated by a proxy on any node
persicanodes 300 IN A 192.168.3.3
persicanodes 300 IN A 192.168.3.4
persicanodes 300 IN A 192.168.3.5

# Now some unique names for all the apps we're going to try
app1.persica 300 IN CNAME persicanodes
app2.persica 300 IN CNAME persicanodes
app3.persica 300 IN CNAME persicanodes
app4.persica 300 IN CNAME persicanodes
app5.persica 300 IN CNAME persicanodes

# These will be BGP or Layer2 MetalLB IPs
lb1.persica 300 IN A 192.168.3.65
lb2.persica 300 IN A 192.168.3.66
lb3.persica 300 IN A 192.168.3.67
lb4.persica 300 IN A 192.168.3.68
lb5.persica 300 IN A 192.168.3.69
}}}

== Load balancing with MetalLB ==

I thought I wouldn't need it, but it looks like I do, if I want sensible useful functionality. Here's an explanation of why I want to use Metal LB, and it's not just for BGP-based configs: https://github.com/kubernetes/ingress-nginx/blob/main/docs/deploy/baremetal.md

Install it:
 1. RTFM: https://metallb.universe.tf/installation/
 2. Grab the manifest and pull it into the repo, I'm using this one as it's similar to work: https://github.com/metallb/metallb/blob/v0.9/manifests/metallb.yaml
 3. Create the namespace first, I'm putting it into the System project: {{{#!yaml
apiVersion: v1
kind: Namespace
metadata:
  name: metallb-system
  # This is the System project on the prod cluster
  annotations:
    field.cattle.io/projectId: c-gfnh7:p-db8t4
  labels:
    app: metallb
}}}
 4. Create the metallb resources: `kubectl apply -f 01-metallb.yaml`
 5. Create the memberlist secret that the nodes need to communicate: `kubectl -n metallb-system create secret generic memberlist --from-literal=secretkey="$$(openssl rand -base64 128)"`
 6. Setup the configmap to configure its behaviour, they have a fully documented example here: https://github.com/metallb/metallb/blob/v0.9/manifests/example-config.yaml
 7. Apply the config: `kubectl apply -f 02-config.yaml`

=== Configure BGP ===
https://metallb.universe.tf/configuration/#bgp-configuration

Let's go for an iBGP design here - we both belong to the same private AS, number 64520

On helena: {{{
/routing/bgp/connection/add name=persica1 remote.address=192.168.3.3 as=64520 local.role=ibgp
/routing/bgp/connection/add name=persica2 remote.address=192.168.3.4 as=64520 local.role=ibgp
/routing/bgp/connection/add name=persica3 remote.address=192.168.3.5 as=64520 local.role=ibgp
}}}

And in metallb we drop this config in: {{{#!yaml
data:
  config: |
    peers:
      - peer-address: 192.168.3.1
        peer-asn: 64520
        my-asn: 64520
    address-pools:
      - name: persica-lb
        protocol: bgp
        addresses:
          - 192.168.3.64/26
        avoid-buggy-ips: true
        auto-assign: false
        bgp-advertisements:
          - aggregation-length: 32
            localpref: 100
            communities:
              - no-export
    bgp-communities:
      # "Do not advertise this route to external BGP peers"
      no-export: 65535:65281
      # "Do not advertise this route to any peer"
      no-advertise: 65535:65282
}}}

The moment I apply this, helena sees a connection from the persica nodes, awesome.

When we just need to define a loadbalanced service in k8s, and they'll start advertising the address.

With a bit of faffing, it does just that. Had to force it to pick the IP I wanted, it uses .64 initially which I don't want. Our version doesn't respect the request by annotation, but spec.loadbalancerIP works (though it's deprecated).




=== Try MetalLB in Layer 2 mode first ===
NB: this is old

I'll use it in L2 mode with ARP/NDP I think. Just need to dedicate a bunch of IPs to it so it can manage the traffic to them.

Holy crap I think I got it working.
 * We'll use it in L2 mode, no BGP yet
 * Set aside 192.168.3.64 - 192.168.3.127 for load balanced services
 * Install it via Rancher helm chart interface, no config
 * Push a simple address pool and advertisement config {{{#!yaml
---
# https://metallb.universe.tf/configuration/#layer-2-configuration

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: metallb-pool-1
  namespace: metallb-system
spec:
  addresses:
    - 192.168.3.65-192.168.3.126

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: metallb-pool-1
  namespace: metallb-system
# Not needed because L2Advertisement claims all IPAddressPools by default
spec:
  ipAddressPools:
    - metallb-pool-1
}}}
 * Copy the existing redis service in the example, and add an external access route to it as a secondary service {{{#!yaml
---
apiVersion: v1
kind: Service
metadata:
  namespace: flask
  name: redis-ext
  labels:
    name: redis
    kubernetes.io/name: "redis"

spec:
  selector:
    app: redis
  ports:
    - name: redis
      protocol: TCP
      port: 6379
  type: LoadBalancer
}}}

It's really as simple as adding `type: LoadBalancer`, then MetalLB selects the next free IP itself and binds it.

=== Try it in BGP mode next ===

TBC


== Making ingress work - was this for the kubeadm method? ==

I don't understand this well enough, but I want to use ingress-nginx. Here's a page about it, albeit not using raw kubectl: https://kubernetes.github.io/ingress-nginx/kubectl-plugin/

Maybe this one too: https://medium.com/tektutor/using-nginx-ingress-controller-in-kubernetes-bare-metal-setup-890eb4e7772





== Things that suck ==

=== cgroups ===

Alma9 introduces cgroups v2, which weren't a thing on Centos 7. That means you have to deal with them now. They tend to break docker a lot, so just revert back to v1 cgroups.

How it manifests:
 * For context: fucking cgroups, k3s dies instantly
 * https://github.com/rancher/rancher/issues/35201#issuecomment-947331154
 * https://groups.google.com/g/linux.debian.bugs.dist/c/Z-Cc0WmlEGA/m/NB6XGDsnAwAJ
 * Finally found a simple solution: https://github.com/rancher/rancher/issues/36165

Fix it:
 * Append an option to the kernel cmdline, this'll do it for you: {{{
grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
}}}
 * Then reboot for it to take effect

=== Networking kernel modules ===

The problem: you fixed cgroups but now you get an error like this when Rancher starts up: {{{
I1125 03:57:50.129406      93 network_policy_controller.go:163] Starting network policy controller
F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
panic: F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
}}} And then it explodes and the container dies.

Turns out you need some iptables modules loaded. This fixed it the first time: {{{
[root@kalina ~]# modprobe iptable_nat
[root@kalina ~]# modprobe br_netfilter
}}} But it happened again the next time I rebuilt the cluster. You gotta make it stick by adding config fragments to `/etc/modules-load.d`

Explanations:
 * This kinda describes the issue: https://slack-archive.rancher.com/t/9761163/hey-folks-i-have-a-quick-question-for-a-newbie-i-have-setup-
 * Yeah it turns out that the rancher container fucking dies in the arse with no explanation when you don't have the iptables modules loaded, duhhhh. I figured that out and made them load on-boot like so: https://forums.centos.org/viewtopic.php?t=72040

=== Firewalls ===

Now whyTF can't persica2 and persica3 contact services on persica1..? Aha, firewalld is running on persica1, and it shouldn't be. Need to disable it using ansible as well.
{{{
systemctl disable firewalld.service --now
}}}

Yeah that's jank, but hey it's what they tell you to do! https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/open-ports-with-firewalld

"We recommend disabling firewalld. For Kubernetes 1.19.x and higher, firewalld must be turned off."

=== Cleanup and try again ===

Find that it doesn't work and you can't make it work, awesome. Tear it all down and start again, killing every container, nuking files, and starting from scratch: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/clean-cluster-nodes#directories-and-files

Eventually, you get a cluster with three working nodes in it!!

=== Installing older versions of kubectl ===

Running an older version of k8s and need an older version of kubectl to go with it? You're shit out of luck, my friend!

https://kubernetes.io/blog/2023/08/15/pkgs-k8s-io-introduction/

They moved to new package repos in 2023, and as of early 2024 the old repos are gone! The new repos only have v1.24 and newer, so if you need anything older it's just not there.

Looks like our last option is: "You can directly download binaries instead of using packages. As an example, see ''Without a package manager'' instructions in "[[https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm| Installing kubeadm]]" document.

And you end up here: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-kubectl-binary-with-curl-on-linux

Here's a modern way of defining the repo on debian-type systems btw:
{{{
# Our cluster is k8s v1.23 so we can use kubectl as late as 1.24

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.24/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources
X-Repolib-Name: Kubernetes
Enabled: yes
Types: deb
URIs: https://pkgs.k8s.io/core:/stable:/v1.24/deb/
Suites: /
Architectures: arm64
Signed-By: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
EOF

apt update
apt install -y kubectl
}}}

=== You can't use selinux ===
It just breaks way too much shit, it's not worth it. Install something new and it doesn't work? You'll forever be wondering "is it selinux" immediately after it fails.

Useful(?) links

Navigation