persica cluster

This is a cluster of three identical nodes, named persica1/2/3

Alma Linux 9.1 x64
Dell Optiplex 9020 Micro
- Intel Core i5-4590T @ 2.00 GHz
- 16gb DDR3-1600
- 128gb SSD

Contents

persica cluster
1. k8s notes
2. Build notes

k8s notes

Make a simple 3-node cluster
Single-node control plane will run externally, on illustrious
Use kubeadm to build the cluster: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
Selected containerd as the container runtime
Will use Flannel as the networking plugin
Allocated IPs:
- persica1 / 192.168.1.31
- persica2 / 192.168.1.32
- persica3 / 192.168.1.33
Ingress: undecided so far
Cgroup driver: let's use systemd
k8s version: whatever is latest right now (2023-04-04)

Build notes

Per node

Update the BIOS using this guide: https://www.dell.com/support/kbdoc/en-au/000131486/update-the-dell-bios-in-a-linux-or-ubuntu-environment#updatebios2015
- Despite the usual Dell docs saying you need to make a DOS boot disk and run the flash updater app from there, it turns out that the BIOS Flash Update target (mash F12 to get the one-time boot menu) can read the 9020MA19.exe file from a FAT32 filesystem on a USB stick just fine
- Not sure if this only works in UEFI mode or not, but I kinda don't care because we want to be in UEFI mode
- This applies to systems made from 2015 or later
- The latest BIOS update for the Optiplex 9020M is version A19, released
Set BIOS to full UEFI mode, no legacy
We'll be using DHCP, so find the MAC address so we can give it a consistent IP address when it boots
Add the MAC address and IP assignment to dnsmasq on calico (a pihole box)
- /etc/dnsmasq.d/02-pihole-dhcp-persica-cluster.conf
- Something like this
```
dhcp-host=98:90:96:BE:89:52,set:persica,192.168.1.31,persica1,5m
# one dhcp-host line per host
dhcp-boot=tag:persica,grub/grubx64.efi,illustrious.thighhighs.top,192.168.1.12
```
- Run pihole restartdns after making changes
PXE boot for kickstart install, which will hit calico for DHCP, then illustrious for the boot image and kickstart config

tftpd-hpa is running on illustrious

Upstream repo mirror: https://repo.almalinux.org/almalinux/9/BaseOS/x86_64/os/EFI/BOOT/

Drop that content in /srv/tftp/

root@illustrious:/srv/tftp# tree
.
├── BOOTX64.EFI
├── default.efi
├── grub
│   ├── grub.cfg
│   ├── grub.cfg-01-98-90-96-be-89-52
│   └── grubx64.efi
├── images
│   └── Alma-9.1
│       ├── initrd.img
│       └── vmlinuz
├── ipxe.efi
└── shimx64.efi

Add a grub config fragment for the host's MAC address: grub.cfg-01-xx-xx-xx-xx-xx-xx
Make sure the grub config has the correct URL for its kickstart config

kickstart file served from /data/www/illustrious/ks: https://illustrious.thighhighs.top/ks/persica1.ks.cfg
- Make sure your per-host config file has the correct name
KS references:
- Reference manual: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/performing_an_advanced_rhel_9_installation/kickstart-commands-and-options-reference_installing-rhel-as-an-experienced-user#keyboard-required_kickstart-commands-for-system-configuration
- Generator tool: https://access.redhat.com/labs/kickstartconfig/
k8s doesn't play well with swap so we need to disable it. Provision a minimal swap volume of 1gb, then disable it later

This was useful for figuring out the TFTP stuff for the first time: https://askubuntu.com/questions/1183487/grub2-efi-boot-via-pxe-load-config-file-automatically

Paths are hardcoded into the grubx64.efi binary, meaning HDD and PXE versions aren't the same. Make sure you put all the grub stuff in a grub/ directory. Check the $prefix to see where it's searching:

UEFI settings

Get to the UEFI

Probably get stuck in windows for first boot
Win, then "UEFI", get to advanced startup options
Boot with Advanced Boot Options
Troubleshoot, Advanced Options, UEFI Firmware Settings, Restart

Record details

Get the LOM MAC Address from Settings, General, System Info

Change settings

General
- Boot Sequence
  - Select UEFI boot list
- Advanced Boot Options
  - Disable Legacy OPROMs
- UEFI Boot Path Security
  - Set to Never
- Date/Time
  - Set clock to approx correct for UTC time
System Configuration
- Integrated NIC
  - Enable UEFI Network Stack
  - Enabled w/ PXE
- SATA Operation
  - AHCI
- SMART Reporting
  - Disabled, we don't need it
- Audio
  - Disable all audio, we don't need it
Security
- TPM Security
  - Check everything except Clear
  - Activated
- CPU XD support
  - Enabled
Secure Boot
- Secure Boot Enable
  - Disabled
Performance
- Multi-core support: All
- Speedstep: Enabled
- C-states: Enabled
- Limit CPUID: Disabled
- Turboboost: Enabled
Power Management
- AC Recovery: Power On
- Deep Sleep Control: Disabled
- USB Wake Support: Enable USB wake from Standby
- Wake on LAN/WLAN: LAN with PXE Boot
- Block Sleep: Enable blocking of sleep
POST Behaviour
- Keyboard Errors: Disable error detection
Virtualisation support
- Enable VT
- Enable VT-d
- Enable Trusted Execution

Reboot and go back in again.

Boot only from IPv4 with NIC (PXE boot)

Ansible management after kickstart build

I should ansible'ise everything, making minimal assumptions about the kickstart part of the process.

I'm keeping a simple ansible repo in ~/git/persica-ansible/

Pre-bootstrap

I have a basic set of roles to get the nodes into a workable state, right before I invoke kubeadm for the first time.

---
- name: Configure persica k8s cluster
  hosts: persica
  roles:
    - role: common
      tags: common
    - role: docker_for_kube
      tags: docker_for_kube
    - role: kube_daemons
      tags: kube_daemons

Initialise the control plane

This is manual of course, no ansible here.

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#initializing-your-control-plane-node

This will be a single-node control plane, but we should specify --control-plane-endpoint anyway. persica1 is going to be our control plane.
Our Pod network add-on will be Flannel. We can specify --pod-network-cidr but I'll try without first.
It'll detect containerd
The default --apiserver-advertise-address will be fine, let it autodetect

I added a custom CNAME record to local pihole (calico) and Gandi (public service), for persica-endpoint => persica1. Unlike the DHCP stuff, this is in the general DNS web interface, not a custom config file.

After a bunch of faffing around to fix up the firewall config, bridge filtering kernel module, and enabling ipv4 forwarding, the init begins after passing preflight checks.

[root@persica1 ~]# kubeadm init --control-plane-endpoint=persica-endpoint
[init] Using Kubernetes version: v1.27.1
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0415 03:43:19.958609   39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
W0415 03:43:52.646765   39430 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local persica-endpoint persica1] and IPs [10.96.0.1 192.168.1.31]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
W0415 03:44:21.781505   39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

No worky :/

https://serverfault.com/questions/1116281/kubeadm-1-25-init-failed-on-debian-11-with-containerd-connection-refused

Maybe I need the control plane on a separate node after all. I'll try illustrious.

copy containerd/config.toml to illustrious
apt install -y apt-transport-https ca-certificates curl
curl -fsSLo /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg

prep repo defn

cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources
X-Repolib-Name: Kubernetes
Enabled: yes
Types: deb
URIs: https://apt.kubernetes.io/
Suites: kubernetes-xenial
Architectures: amd64
Components: main
Signed-By: /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg
X-Repolib-ID: Kubernetes
EOF

apt update
apt install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

Now try kubeadm again.