Differences between revisions 10 and 31 (spanning 21 versions)

persica cluster

This is a cluster of three identical nodes, named persica1/2/3

Alma Linux 9.3 x64
Dell Optiplex 9020 Micro
- Intel Core i5-4590T @ 2.00 GHz
- 16gb DDR3-1600
- 128gb SSD SK hynix SC311 SATA M.2
- I bought the three of them for $405 in total, so $135 AUD each, in March 2023.

I last touched this in April 2023 and it was very annoying to get as far as I did. Next time I look at it, I think I will rebuild the cluster from scratch again, and use a different guide. Something with actual explanations and a few opinions, like this one: https://github.com/hobby-kube/guide

Contents

persica cluster

Another rebuild attempt in late 2023

A few changes for this one:

I'm going to use Rancher this time, or that guide linked above
Alma 9.3 because it's the latest
Move them to the "subnet" of 192.168.1.32/29 so I can configure the router to give them DHCP options easily
- persica1 / 192.168.1.33
- persica2 / 192.168.1.34
- persica3 / 192.168.1.35
Buy another node and run the controller on that rather than illustrious, which in this case might be the rancher docker container
- kalina / 192.168.1.39
- persica / CNAME to kalina
Go with Longhorn for PVCs
Dunno what to do about ingress yet

Prepare kalina controller node

Build servers/azusa as the network services node.

Kickstart build kalina using the configs on azusa.

Run ansible against kalina, this will configure the OS and install docker.

Check that docker works

docker run hello-world

Push the certs from illustrious to kalina: https://ranchermanager.docs.rancher.com/pages-for-subheaders/rancher-on-a-single-node-with-docker#option-c-bring-your-own-certificate-signed-by-a-recognized-ca

On illustrious:

cd /etc/ssl/

rsync -avx \
  STAR_thighhighs_top.key \
  STAR_thighhighs_top.crtbundled \
  STAR_thighhighs_top.key.2023 \
  STAR_thighhighs_top.crtbundled.2023 \
  root@kalina:/etc/ssl/

Then on kalina:

chown root:root /etc/ssl/STAR_thighhighs_top.*

This cgroup stuff might not affect Alma, we'll see

For context: Fucking cgroups, k3s dies instantly:

https://github.com/rancher/rancher/issues/35201#issuecomment-947331154
https://groups.google.com/g/linux.debian.bugs.dist/c/Z-Cc0WmlEGA/m/NB6XGDsnAwAJ
Answer is here https, you append stuff to cmdline://github.com/rancher/rancher/issues/36165

Looks like cgroups is still boned on Alma9 as well, tried running this:

grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"

Then rebooting.

Buuut it's still no good, wtf

I1125 03:57:50.129406      93 network_policy_controller.go:163] Starting network policy controller
F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
panic: F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.


goroutine 14975 [running]:
k8s.io/klog/v2.(*loggingT).output(0x82f93c0, 0x3, 0x0, 0xc004466e70, 0x1, {0x67beecd?, 0x2?}, 0xc00629dc00?, 0x0)
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:926 +0x6bd
k8s.io/klog/v2.(*loggingT).printfDepth(0x82f93c0, 0x13?, 0x0, {0x0, 0x0}, 0x19c?, {0x4ec51ee, 0x3b}, {0xc004c5a6e0, 0x2, ...})
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:736 +0x1f3
k8s.io/klog/v2.(*loggingT).printf(...)
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:718
k8s.io/klog/v2.Fatalf(...)
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:1598
github.com/cloudnativelabs/kube-router/v2/pkg/controllers/netpol.(*NetworkPolicyController).ensureTopLevelChains(0xc0084ef7a0)
        /go/pkg/mod/github.com/k3s-io/kube-router/v2@v2.0.1-0.20230411195838-cced939a8ba1/pkg/controllers/netpol/network_policy_controller.go:404 +0x166f
github.com/cloudnativelabs/kube-router/v2/pkg/controllers/netpol.(*NetworkPolicyController).Run(0xc0084ef7a0, 0xc00f34eb40, 0xc0009dc4e0, 0xc006fdd380)
        /go/pkg/mod/github.com/k3s-io/kube-router/v2@v2.0.1-0.20230411195838-cced939a8ba1/pkg/controllers/netpol/network_policy_controller.go:167 +0x175
created by github.com/k3s-io/k3s/pkg/agent/netpol.Run
        /go/src/github.com/k3s-io/k3s/pkg/agent/netpol/netpol.go:141 +0xd85

Is it iptables, or is that a red herring? I can't tell if this helped:

[root@kalina ~]# modprobe iptable_nat
[root@kalina ~]# modprobe br_netfilter

but now i don't get an error when running iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait inside the container.

This kinda describes the issue: https://slack-archive.rancher.com/t/9761163/hey-folks-i-have-a-quick-question-for-a-newbie-i-have-setup-

Run Rancher

Run rancher container according to this note about using ARM systems, it just tells you to specify an exact version so you know it's built with arm64 support: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64

docker run -d --restart=unless-stopped \
  -p 443:443 \
  -v /etc/ssl/STAR_thighhighs_top.crtbundled:/etc/rancher/ssl/cert.pem \
  -v /etc/ssl/STAR_thighhighs_top.key:/etc/rancher/ssl/key.pem \
  --privileged \
  rancher/rancher:v2.7.9 \
  --no-cacerts

It'll take some time to start. Then you can try hitting the Rancher web UI: https://kalina.thighhighs.top/

Login with the local user password as directed, then let it set the new admin password. Record it somewhere safe, and set the server URL to https://persica.thighhighs.top

Build the nodes

Prepare the nodes as described below in the "Hardware notes for the cluster nodes" section. This is mostly a one-time thing.

This just works now, huzzah! Manually kick the BIOS of each node to do a one-time PXE boot, then let it do its thing.

Node configs

Salvage my old ansible playbook stuff and copy it to asval. Run it from there.

apt install -y ansible sshpass

cd ~/git/persica-ansible/
make persica ARGS="-C --tags common"

Stand up the cluster

https://ranchermanager.docs.rancher.com/pages-for-subheaders/use-existing-nodes

Create cluster in rancher, select RKE1, leave options as default, tick the boxes and find the command to run on each node.

docker is already installed from my last attempt, try to get it going.

systemctl enable docker.service --now

docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.7.9 --server https://persica.thighhighs.top --token lx5qjbl4dn7zkpbmt5qqz8qfdvtgsl2x5ft95j8lh785bxrjjccq2t --etcd --controlplane --worker

docker logs recursing_proskuriakova -f

Run this on each node to onboard it to the cluster.

Now whyTF can't persica2 and persica3 contact services on persica1..? Aha, firewalld is running on persica1, and it shouldn't be. Need to disable it on all three nodes.

systemctl disable firewalld.service --now

Find that it doesn't work and you can't make it work. Tear it all down and start again, killing every container, nuking files, and starting from scratch: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/clean-cluster-nodes#directories-and-files

Eventually, you get a cluster with three working nodes in it!!

Install kubectl on cluster controller vector

https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management

# Our cluster is k8s v1.23 so we can use kubectl as late as 1.24

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.24/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources
X-Repolib-Name: Kubernetes
Enabled: yes
Types: deb
URIs: https://pkgs.k8s.io/core:/stable:/v1.24/deb/
Suites: /
Architectures: arm64
Signed-By: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
EOF

apt update
apt install -y kubectl

Hit the kebab menu on the cluster and copy the kubeconfig to your clipboard: https://persica.thighhighs.top/dashboard/c/_/manager/provisioning.cattle.io.cluster

Go paste that into ~/.kube/config in your account on vector, now you can run kubectl there!

Install Longhorn cluster storage manager

This is done from the builtin Helm charts, let it go to work. It's a couple of simple clicks.

For some reason the predefined things you can configure on the helm chart don't include the local path to the disk on each node. Which is pretty bloody obvious you'd think, but no. It'll default to /var/lib/longhorn or something unless you override it. Go to the YAML page and change the defaultDataPath to /persist/longhorn/ instead, then run the install.

I had to disable selinux on the nodes because it broke volume attachment for some unknown reason. After doing that it eventually came good after some retries.

I tried out this dude's demo app that uses flask and redis to deploy a trivial website, that was a nifty test of all the bits working together as expected:

Blessedly the ingress just works. No idea what to do yet to make a service that presents itself on public IPs.

Try MetalLB

Holy crap I think I got it working.

We'll use it in L2 mode, no BGP yet
Set aside 192.168.1.57 - 192.168.1.63 for load balanced services
Install it via Rancher helm chart interface, no config

Push a simple address pool and advertisement config

---
# https://metallb.universe.tf/configuration/#layer-2-configuration

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: metallb-pool-1
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.57-192.168.1.63

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: metallb-pool-1
  namespace: metallb-system
# Not needed because L2Advertisement claims all IPAddressPools by default
spec:
  ipAddressPools:
    - metallb-pool-1

Copy the existing redis service in the example, and add an external access route to it as a secondary service

---
apiVersion: v1
kind: Service
metadata:
  namespace: flask
  name: redis-ext
  labels:
    name: redis
    kubernetes.io/name: "redis"

spec:
  selector:
    app: redis
  ports:
    - name: redis
      protocol: TCP
      port: 6379
  type: LoadBalancer

It's really as simple as adding type: LoadBalancer, then MetalLB selects the next free IP itself and binds it.

Hardware prep for the cluster nodes

Setup each new node like so:

Ansible management after kickstart build

This is getting everything to the state where I can bootstrap the cluster. I should ansible'ise everything, making minimal assumptions about the kickstart part of the process.

I'm keeping a simple ansible repo in ~/git/persica-ansible/

I have a basic set of roles to get the nodes into a workable state, right before I invoke kubeadm for the first time.

---
- name: Configure persica k8s cluster
  hosts: persica
  roles:
    - role: common
      tags: common
    - role: docker_for_kube
      tags: docker_for_kube
    - role: kube_daemons
      tags: kube_daemons

Initialise the control plane

This is manual of course, no ansible here.

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#initializing-your-control-plane-node

This will be a single-node control plane, but we should specify --control-plane-endpoint anyway. persica1 is going to be our control plane.
Our Pod network add-on will be Flannel. We can specify --pod-network-cidr but I'll try without first.
It'll detect containerd
The default --apiserver-advertise-address will be fine, let it autodetect

I added a custom CNAME record to local pihole (calico) and Gandi (public service), for persica-endpoint => persica1. Unlike the DHCP stuff, this is in the general DNS web interface, not a custom config file.

After a bunch of faffing around to fix up the firewall config, bridge filtering kernel module, and enabling ipv4 forwarding, the init begins after passing preflight checks.

[root@persica1 ~]# kubeadm init --control-plane-endpoint=persica-endpoint
[init] Using Kubernetes version: v1.27.1
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0415 03:43:19.958609   39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
W0415 03:43:52.646765   39430 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local persica-endpoint persica1] and IPs [10.96.0.1 192.168.1.31]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
W0415 03:44:21.781505   39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher

No worky :/

https://serverfault.com/questions/1116281/kubeadm-1-25-init-failed-on-debian-11-with-containerd-connection-refused

Maybe I need the control plane on a separate node after all. I'll try illustrious.

copy containerd/config.toml to illustrious
apt install -y apt-transport-https ca-certificates curl
curl -fsSLo /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg

prep repo defn

cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources
X-Repolib-Name: Kubernetes
Enabled: yes
Types: deb
URIs: https://apt.kubernetes.io/
Suites: kubernetes-xenial
Architectures: amd64
Components: main
Signed-By: /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg
X-Repolib-ID: Kubernetes
EOF

apt update
apt install -y kubelet kubeadm kubectl
apt-mark hold kubelet kubeadm kubectl

Now try kubeadm again.

Oh sonovabitch! Config not well described: https://github.com/containerd/containerd/issues/6964

Fixed config /etc/containerd/config.toml:

version = 2

disabled_plugins = []

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    base_runtime_spec = ""
    cni_conf_dir = ""
    cni_max_conf_num = 0
    container_annotations = []
    pod_annotations = []
    privileged_without_host_devices = false
    runtime_engine = ""
    runtime_path = ""
    runtime_root = ""
    runtime_type = "io.containerd.runc.v2"

  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    BinaryName = ""
    CriuImagePath = ""
    CriuPath = ""
    CriuWorkPath = ""
    IoGid = 0
    IoUid = 0
    NoNewKeyring = false
    NoPivotRoot = false
    Root = ""
    ShimCgroup = ""
    SystemdCgroup = true

# They suggest pinning this image, so we'll do that. This is the out-of-box default.
# https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd
[plugins."io.containerd.grpc.v1.cri"]
  sandbox_image = "registry.k8s.io/pause:3.9"

We could/should be using kubeadm init with a configuration file: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/

Apr 15 04:48:26 illustrious.thighhighs.top systemd[1]: Started kubelet: The Kubernetes Node Agent.
Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.

But screw that. Because guess what, it's also poorly documented!

Initialising the control plane now actually works

kubeadm init --control-plane-endpoint=persica-endpoint

Setup my `~/.kube/` config stuff as directed. Apparently this is an uber-superuser, so I shouldn't be using it regularly. Oh.

cat <<EOF > kubeconfig_example.yml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
# Will be used as the target "cluster" in the kubeconfig
clusterName: "persica"
# Will be used as the "server" (IP or DNS name) of this cluster in the kubeconfig
controlPlaneEndpoint: "persica-endpoint.thighhighs.top:6443"
# The cluster CA key and certificate will be loaded from this local directory
certificatesDir: "/etc/kubernetes/pki"
EOF

# on illustrious
kubeadm kubeconfig user --config kubeconfig_example.yml --client-name furinkan --validity-period 8760h

Now try adding a pod network. We'll use Flannel, and find the docs ourselves: https://github.com/flannel-io/flannel#deploying-flannel-manually

# from suomi
kubectl --context=persica-admin apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

kubectl --context=persica-admin get pods --all-namespaces
NAMESPACE      NAME                                                 READY   STATUS              RESTARTS      AGE
kube-flannel   kube-flannel-ds-zr6fb                                0/1     CrashLoopBackOff    1 (16s ago)   34s
kube-system    coredns-5d78c9869d-mp7p9                             0/1     ContainerCreating   0             66m
kube-system    coredns-5d78c9869d-tlsc6                             0/1     ContainerCreating   0             66m
kube-system    etcd-illustrious.thighhighs.top                      1/1     Running             1             66m
kube-system    kube-apiserver-illustrious.thighhighs.top            1/1     Running             1             66m
kube-system    kube-controller-manager-illustrious.thighhighs.top   1/1     Running             1             66m
kube-system    kube-proxy-5mntm                                     1/1     Running             0             66m
kube-system    kube-scheduler-illustrious.thighhighs.top            1/1     Running             1             66m

Doesn't work because we don't have the same podCIDR, and the default isn't compatible with whatever kubeadm does? FFS!

https://devops.stackexchange.com/questions/5898/how-to-get-kubernetes-pod-network-cidr

Okay so I can either nuke the cluster and reinstantiate it with podCIDR, or just reinstall the network plugin or something. Let's try the latter.

get the current podCIDR: https://devops.stackexchange.com/a/14867
kubeadm config print init-defaults | grep serviceSubnet
wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
Edit it
Reapply it? kubectl apply -f kube-flannel.yml
Is it still crashlooping? kubectl get pods --all-namespaces

Yeah.

Fukkit try again

# on illustrious
kubeadm reset
rm -rf /etc/cni/net.d/
rm -rf ~/.kube/

# fix the init: https://github.com/flannel-io/flannel/issues/728#issuecomment-308878912
kubeadm init --control-plane-endpoint=persica-endpoint.thighhighs.top --pod-network-cidr=10.244.0.0/16

# Fix up my kubectl creds again

# install flannel again
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

# is it working now?
kubectl get pods --all-namespaces

# IT FUCKING WORKS!!

Now we join some worker nodes to the cluster, finally.

# on persica1
kubeadm join persica-endpoint.thighhighs.top:6443 --token FOO.FOOFOOFOO \
        --discovery-token-ca-cert-hash sha256:BARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBAR

[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

It's joined but apparently NotReady:

root@illustrious:~# kubectl get nodes
NAME                         STATUS     ROLES           AGE    VERSION
illustrious.thighhighs.top   NotReady   control-plane   17m    v1.27.1
persica1                     NotReady   <none>          2m7s   v1.27.0

Apparently coredns won't start because of taints, as described here:

https://serverfault.com/questions/1064936/coredns-pods-stuck-in-pending-state
- No explanation as to why the taints aren't going away
Similar problem here
- Someone says to just restart containerd

Fuck yoooooouuu, now the coredns containers are running. I probably shouldn't have jumped the gun and joined all the worker nodes... I need to kick them so they start properly.

root@illustrious:~# kubectl get pods --all-namespaces
NAMESPACE      NAME                                                 READY   STATUS              RESTARTS   AGE
kube-flannel   kube-flannel-ds-4p4wd                                0/1     Init:0/2            0          21m
kube-flannel   kube-flannel-ds-6qfrm                                0/1     Init:0/2            0          12m
kube-flannel   kube-flannel-ds-kb94w                                0/1     Init:0/2            0          12m
kube-flannel   kube-flannel-ds-vctrt                                1/1     Running             0          30m
kube-system    coredns-5d78c9869d-dqnkh                             1/1     Running             0          36m
kube-system    coredns-5d78c9869d-rbmhm                             1/1     Running             0          36m
kube-system    etcd-illustrious.thighhighs.top                      1/1     Running             2          36m
kube-system    kube-apiserver-illustrious.thighhighs.top            1/1     Running             2          36m
kube-system    kube-controller-manager-illustrious.thighhighs.top   1/1     Running             0          36m
kube-system    kube-proxy-8dl56                                     0/1     ContainerCreating   0          12m
kube-system    kube-proxy-dppxt                                     0/1     ContainerCreating   0          21m
kube-system    kube-proxy-ljk6c                                     1/1     Running             0          36m
kube-system    kube-proxy-t7gcn                                     0/1     ContainerCreating   0          12m
kube-system    kube-scheduler-illustrious.thighhighs.top            1/1     Running             2          36m

Try deleting and re-adding a node. From https://stackoverflow.com/a/54220808/806927

# on illustrious
kubectl get nodes
kubectl drain persica1
kubectl drain persica1 --ignore-daemonsets --delete-local-data
kubectl delete node persica1

# on persica1
kubeadm reset

then join again

Looks like the kube-proxy is having trouble starting on persica1. And while it's only a warning, I bet it's more significant than that.

root@illustrious:~# kubectl get pods --all-namespaces
NAMESPACE      NAME                                                 READY   STATUS              RESTARTS   AGE
kube-flannel   kube-flannel-ds-gjq5h                                0/1     Init:0/2            0          3m33s
kube-flannel   kube-flannel-ds-vctrt                                1/1     Running             0          41m
kube-system    coredns-5d78c9869d-dqnkh                             1/1     Running             0          47m
kube-system    coredns-5d78c9869d-rbmhm                             1/1     Running             0          47m
kube-system    etcd-illustrious.thighhighs.top                      1/1     Running             2          47m
kube-system    kube-apiserver-illustrious.thighhighs.top            1/1     Running             2          47m
kube-system    kube-controller-manager-illustrious.thighhighs.top   1/1     Running             0          47m
kube-system    kube-proxy-ljk6c                                     1/1     Running             0          47m
kube-system    kube-proxy-xpv58                                     0/1     ContainerCreating   0          3m33s
kube-system    kube-scheduler-illustrious.thighhighs.top            1/1     Running             2          47m

root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-xpv58
4m29s       Normal    Scheduled                pod/kube-proxy-xpv58   Successfully assigned kube-system/kube-proxy-xpv58 to persica1
9s          Warning   FailedCreatePodSandBox   pod/kube-proxy-xpv58   Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory


# on persica1
mkdir /run/systemd/resolve
ln -s /etc/resolv.conf /run/systemd/resolve/resolv.conf

wtf now there's another error:

root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-grqhf
20s         Normal    Scheduled                pod/kube-proxy-grqhf   Successfully assigned kube-system/kube-proxy-grqhf to persica1
6s          Warning   FailedCreatePodSandBox   pod/kube-proxy-grqhf   Failed to create pod sandbox: rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument

I think I haven't deployed a good containerd config everywhere yet. Deployed that, and suddenly the damn kube-proxy and kube-flannel containers are working.

Now I can add the other two nodes, still need to fix the resolv.conf manually.

root@illustrious:~# kubectl get nodes -o wide
NAME                         STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                    KERNEL-VERSION                CONTAINER-RUNTIME
illustrious.thighhighs.top   Ready    control-plane   78m     v1.27.1   192.168.1.12   <none>        Ubuntu 22.04.2 LTS          5.15.0-69-generic             containerd://1.6.20
persica1                     Ready    <none>          21m     v1.27.0   192.168.1.31   <none>        AlmaLinux 9.1 (Lime Lynx)   5.14.0-162.6.1.el9_1.x86_64   containerd://1.6.20
persica2                     Ready    <none>          2m41s   v1.27.0   192.168.1.32   <none>        AlmaLinux 9.1 (Lime Lynx)   5.14.0-162.6.1.el9_1.x86_64   containerd://1.6.20
persica3                     Ready    <none>          33s     v1.27.0   192.168.1.33   <none>        AlmaLinux 9.1 (Lime Lynx)   5.14.0-162.6.1.el9_1.x86_64   containerd://1.6.20

Good enough for now!

Making ingress work

I don't understand this well enough, but I want to use ingress-nginx. Here's a page about it, albeit not using raw kubectl: https://kubernetes.github.io/ingress-nginx/kubectl-plugin/

Maybe this one too: https://medium.com/tektutor/using-nginx-ingress-controller-in-kubernetes-bare-metal-setup-890eb4e7772

Making load balancing work

I thought I wouldn't need it, but it looks like I do, if I want sensible useful functionality. Here's an explanation of why I want to use Metal LB, and it's not just for BGP-based configs: https://github.com/kubernetes/ingress-nginx/blob/main/docs/deploy/baremetal.md

I'll use it in L2 mode with ARP/NDP I think. Just need to dedicate a bunch of IPs to it so it can manage the traffic to them.

-  ⇤ ← Revision 10 as of 2023-04-10 12:52:31 → 
  Size: 5769
  Editor: furinkan
  Comment: add note about flashing the BIOS
+   ← Revision 31 as of 2023-11-25 04:06:15 → ⇥
  Size: 32689
  Editor: furinkan
  Comment: get rancher running again
-Deletions are marked like this.
+Additions are marked like this.
 Line 5:
- * Alma Linux 9.1 x64
+ * Alma Linux 9.3 x64
 Line 9:
-  * 128gb SSD
+  * 128gb SSD SK hynix SC311 SATA M.2
  * I bought the three of them for $405 in total, so $135 AUD each, in March 2023.

{{{#!wiki note
I last touched this in April 2023 and it was very annoying to get as far as I did. Next time I look at it, I think I will rebuild the cluster from scratch again, and use a different guide. Something with actual explanations and a few opinions, like this one: https://github.com/hobby-kube/guide
}}}
-Line 13:
+Line 18:
-== k8s notes ==

 * Make a simple 3-node cluster
 * Single-node control plane will run externally, on illustrious
 * Use kubeadm to build the cluster: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
 * Selected containerd as the container runtime
 * Will use Flannel as the networking plugin
 * Allocated IPs:
  * persica1 / 192.168.1.31
  * persica2 / 192.168.1.32
  * persica3 / 192.168.1.33
 * Ingress: undecided so far
 * Cgroup driver: let's use systemd
 * k8s version: whatever is latest right now (2023-04-04)

== Build notes ==

=== Per node ===

 * Update the BIOS using this guide: https://www.dell.com/support/kbdoc/en-au/000131486/update-the-dell-bios-in-a-linux-or-ubuntu-environment#updatebios2015
  * Despite the usual Dell docs saying you need to make a DOS boot disk and run the flash updater app from there, it turns out that the BIOS Flash Update target (mash F12 to get the one-time boot menu) can read the `9020MA19.exe` file from a FAT32 filesystem on a USB stick just fine
  * Not sure if this only works in UEFI mode or not, but I kinda don't care because we ''want'' to be in UEFI mode
  * This applies to systems made from 2015 or later
  * The latest BIOS update for the Optiplex 9020M is version A19, released 
 * Set BIOS to full UEFI mode, no legacy
 * We'll be using DHCP, so find the MAC address so we can give it a consistent IP address when it boots
 * Add the MAC address and IP assignment to dnsmasq on calico (a pihole box)
  * `/etc/dnsmasq.d/02-pihole-dhcp-persica-cluster.conf`
  * Something like this {{{
dhcp-host=98:90:96:BE:89:52,set:persica,192.168.1.31,persica1,5m
# one dhcp-host line per host
dhcp-boot=tag:persica,grub/grubx64.efi,illustrious.thighhighs.top,192.168.1.12
}}}
  * Run `pihole restartdns` after making changes
 * PXE boot for kickstart install, which will hit calico for DHCP, then illustrious for the boot image and kickstart config
 * tftpd-hpa is running on illustrious
  * Upstream repo mirror: https://repo.almalinux.org/almalinux/9/BaseOS/x86_64/os/EFI/BOOT/
  * Drop that content in `/srv/tftp/` {{{
root@illustrious:/srv/tftp# tree
.
├── BOOTX64.EFI
├── default.efi
├── grub
│   ├── grub.cfg
│   ├── grub.cfg-01-98-90-96-be-89-52
│   └── grubx64.efi
├── images
│   └── Alma-9.1
│       ├── initrd.img
│       └── vmlinuz
├── ipxe.efi
└── shimx64.efi
}}}
  * Add a grub config fragment for the host's MAC address: `grub.cfg-01-xx-xx-xx-xx-xx-xx`
  * Make sure the grub config has the correct URL for its kickstart config
 * kickstart file served from `/data/www/illustrious/ks`: https://illustrious.thighhighs.top/ks/persica1.ks.cfg
  * Make sure your per-host config file has the correct name
 * KS references:
  * Reference manual: https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/9/html/performing_an_advanced_rhel_9_installation/kickstart-commands-and-options-reference_installing-rhel-as-an-experienced-user#keyboard-required_kickstart-commands-for-system-configuration
  * Generator tool: https://access.redhat.com/labs/kickstartconfig/
 * k8s doesn't play well with swap so we need to disable it. Provision a minimal swap volume of 1gb, then disable it later

This was useful for figuring out the TFTP stuff for the first time: https://askubuntu.com/questions/1183487/grub2-efi-boot-via-pxe-load-config-file-automatically

Paths are hardcoded into the `grubx64.efi` binary, meaning HDD and PXE versions aren't the same. Make sure you put all the grub stuff in a `grub/` directory. Check the `$prefix` to see where it's searching: 

=== UEFI settings ===

Get to the UEFI
 * Probably get stuck in windows for first boot
 * Win, then "UEFI", get to advanced startup options
 * Boot with Advanced Boot Options
 * Troubleshoot, Advanced Options, UEFI Firmware Settings, Restart

Record details
 * Get the LOM MAC Address from Settings, General, System Info

Change settings
 * General
  * Boot Sequence
   * Select UEFI boot list
  * Advanced Boot Options
   * Disable Legacy OPROMs
  * UEFI Boot Path Security
   * Set to Never
  * Date/Time
   * Set clock to approx correct for UTC time
 * System Configuration
  * Integrated NIC
   * Enable UEFI Network Stack
   * Enabled w/ PXE
  * SATA Operation
   * AHCI
  * SMART Reporting
   * Disabled, we don't need it
  * Audio
   * Disable all audio, we don't need it
 * Security
  * TPM Security
   * Check everything except Clear
   * Activated
  * CPU XD support
   * Enabled
 * Secure Boot
  * Secure Boot Enable
   * Disabled
 * Performance
  * Multi-core support: All
  * Speedstep: Enabled
  * C-states: Enabled
  * Limit CPUID: Disabled
  * Turboboost: Enabled
 * Power Management
  * AC Recovery: Power On
  * Deep Sleep Control: Disabled
  * USB Wake Support: Enable USB wake from Standby
  * Wake on LAN/WLAN: LAN with PXE Boot
  * Block Sleep: Enable blocking of sleep
 * POST Behaviour
  * Keyboard Errors: Disable error detection
 * Virtualisation support
  * Enable VT
  * Enable VT-d
  * Enable Trusted Execution

Reboot and go back in again.
 * Boot only from IPv4 with NIC (PXE boot)
+== Another rebuild attempt in late 2023 ==

A few changes for this one:
 * I'm going to use Rancher this time, or that guide linked above
 * Alma 9.3 because it's the latest
 * Move them to the "subnet" of 192.168.1.32/29 so I can configure the router to give them DHCP options easily
  * persica1 / 192.168.1.33
  * persica2 / 192.168.1.34
  * persica3 / 192.168.1.35
 * Buy another node and run the controller on that rather than illustrious, which in this case might be the rancher docker container
  * kalina / 192.168.1.39
  * persica / CNAME to kalina
 * Go with Longhorn for PVCs
 * Dunno what to do about ingress yet


=== Prepare kalina controller node ===

Build [[servers/azusa]] as the network services node.

Kickstart build kalina using the configs on azusa.

Run ansible against kalina, this will configure the OS and install docker.

Check that docker works {{{
docker run hello-world
}}}

Push the certs from illustrious to kalina: https://ranchermanager.docs.rancher.com/pages-for-subheaders/rancher-on-a-single-node-with-docker#option-c-bring-your-own-certificate-signed-by-a-recognized-ca

On illustrious: {{{
cd /etc/ssl/

rsync -avx \
  STAR_thighhighs_top.key \
  STAR_thighhighs_top.crtbundled \
  STAR_thighhighs_top.key.2023 \
  STAR_thighhighs_top.crtbundled.2023 \
  root@kalina:/etc/ssl/
}}}
Then on kalina: {{{
chown root:root /etc/ssl/STAR_thighhighs_top.*
}}}

'''This cgroup stuff might not affect Alma, we'll see'''

For context: Fucking cgroups, k3s dies instantly:
 * https://github.com/rancher/rancher/issues/35201#issuecomment-947331154
 * https://groups.google.com/g/linux.debian.bugs.dist/c/Z-Cc0WmlEGA/m/NB6XGDsnAwAJ
 * Answer is here https, you append stuff to cmdline://github.com/rancher/rancher/issues/36165

Looks like cgroups is still boned on Alma9 as well, tried running this: {{{
grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"
}}} Then rebooting.

Buuut it's still no good, wtf {{{
I1125 03:57:50.129406      93 network_policy_controller.go:163] Starting network policy controller
F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.
panic: F1125 03:57:50.130225      93 network_policy_controller.go:404] failed to run iptables command to create KUBE-ROUTER-FORWARD chain due to running [/usr/bin/iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait]: exit status 3: iptables v1.8.8 (legacy): can't initialize iptables table `filter': Table does not exist (do you need to insmod?)
Perhaps iptables or your kernel needs to be upgraded.


goroutine 14975 [running]:
k8s.io/klog/v2.(*loggingT).output(0x82f93c0, 0x3, 0x0, 0xc004466e70, 0x1, {0x67beecd?, 0x2?}, 0xc00629dc00?, 0x0)
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:926 +0x6bd
k8s.io/klog/v2.(*loggingT).printfDepth(0x82f93c0, 0x13?, 0x0, {0x0, 0x0}, 0x19c?, {0x4ec51ee, 0x3b}, {0xc004c5a6e0, 0x2, ...})
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:736 +0x1f3
k8s.io/klog/v2.(*loggingT).printf(...)
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:718
k8s.io/klog/v2.Fatalf(...)
        /go/pkg/mod/github.com/k3s-io/klog/v2@v2.80.1-k3s1/klog.go:1598
github.com/cloudnativelabs/kube-router/v2/pkg/controllers/netpol.(*NetworkPolicyController).ensureTopLevelChains(0xc0084ef7a0)
        /go/pkg/mod/github.com/k3s-io/kube-router/v2@v2.0.1-0.20230411195838-cced939a8ba1/pkg/controllers/netpol/network_policy_controller.go:404 +0x166f
github.com/cloudnativelabs/kube-router/v2/pkg/controllers/netpol.(*NetworkPolicyController).Run(0xc0084ef7a0, 0xc00f34eb40, 0xc0009dc4e0, 0xc006fdd380)
        /go/pkg/mod/github.com/k3s-io/kube-router/v2@v2.0.1-0.20230411195838-cced939a8ba1/pkg/controllers/netpol/network_policy_controller.go:167 +0x175
created by github.com/k3s-io/k3s/pkg/agent/netpol.Run
        /go/src/github.com/k3s-io/k3s/pkg/agent/netpol/netpol.go:141 +0xd85
}}}

Is it iptables, or is that a red herring? I can't tell if this helped: {{{
[root@kalina ~]# modprobe iptable_nat
[root@kalina ~]# modprobe br_netfilter
}}} but now i don't get an error when running `iptables -t filter -S KUBE-ROUTER-FORWARD 1 --wait` inside the container.

This kinda describes the issue: https://slack-archive.rancher.com/t/9761163/hey-folks-i-have-a-quick-question-for-a-newbie-i-have-setup-


=== Run Rancher ===

Run rancher container according to this note about using ARM systems, it just tells you to specify an exact version so you know it's built with arm64 support: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64
{{{
docker run -d --restart=unless-stopped \
  -p 443:443 \
  -v /etc/ssl/STAR_thighhighs_top.crtbundled:/etc/rancher/ssl/cert.pem \
  -v /etc/ssl/STAR_thighhighs_top.key:/etc/rancher/ssl/key.pem \
  --privileged \
  rancher/rancher:v2.7.9 \
  --no-cacerts
}}}

It'll take some time to start. Then you can try hitting the Rancher web UI: https://kalina.thighhighs.top/

Login with the local user password as directed, then let it set the new admin password. Record it somewhere safe, and set the server URL to https://persica.thighhighs.top


=== Build the nodes ===

Prepare the nodes as described below in the "Hardware notes for the cluster nodes" section. This is mostly a one-time thing.

This just works now, huzzah! Manually kick the BIOS of each node to do a one-time PXE boot, then let it do its thing.


=== Node configs ===

Salvage my old ansible playbook stuff and copy it to asval. Run it from there.
{{{
apt install -y ansible sshpass

cd ~/git/persica-ansible/
make persica ARGS="-C --tags common"
}}}


=== Stand up the cluster ===

https://ranchermanager.docs.rancher.com/pages-for-subheaders/use-existing-nodes

Login to each persica node and add root@vector's ssh pubkey to the `authorized_keys`

Create cluster in rancher, select RKE1, leave options as default, tick the boxes and find the command to run on each node.

docker is already installed from my last attempt, try to get it going.
{{{
systemctl enable docker.service --now

docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run  rancher/rancher-agent:v2.7.9 --server https://persica.thighhighs.top --token lx5qjbl4dn7zkpbmt5qqz8qfdvtgsl2x5ft95j8lh785bxrjjccq2t --etcd --controlplane --worker

docker logs recursing_proskuriakova -f
}}}

Run this on each node to onboard it to the cluster.

Now whyTF can't persica2 and persica3 contact services on persica1..? Aha, firewalld is running on persica1, and it shouldn't be. Need to disable it on all three nodes.
{{{
systemctl disable firewalld.service --now
}}}

Find that it doesn't work and you can't make it work. Tear it all down and start again, killing every container, nuking files, and starting from scratch: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/clean-cluster-nodes#directories-and-files

Eventually, you get a cluster with three working nodes in it!!


=== Install kubectl on cluster controller vector ===

https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/#install-using-native-package-management
{{{
# Our cluster is k8s v1.23 so we can use kubectl as late as 1.24

curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.24/deb/Release.key | gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg

cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources
X-Repolib-Name: Kubernetes
Enabled: yes
Types: deb
URIs: https://pkgs.k8s.io/core:/stable:/v1.24/deb/
Suites: /
Architectures: arm64
Signed-By: /etc/apt/keyrings/kubernetes-apt-keyring.gpg
EOF

apt update
apt install -y kubectl
}}}

Hit the kebab menu on the cluster and copy the kubeconfig to your clipboard: https://persica.thighhighs.top/dashboard/c/_/manager/provisioning.cattle.io.cluster

Go paste that into `~/.kube/config` in your account on vector, now you can run kubectl there!



=== Install Longhorn cluster storage manager ===

This is done from the builtin Helm charts, let it go to work. It's a couple of simple clicks.

For some reason the predefined things you can configure on the helm chart '''don't''' include the local path to the disk on each node. Which is pretty bloody obvious you'd think, but no. It'll default to `/var/lib/longhorn` or something unless you override it. Go to the YAML page and change the `defaultDataPath` to `/persist/longhorn/` instead, then run the install.

I had to disable selinux on the nodes because it broke volume attachment for some unknown reason. After doing that it eventually came good after some retries.

I tried out this dude's demo app that uses flask and redis to deploy a trivial website, that was a nifty test of all the bits working together as expected:
 * https://ranchergovernment.com/blog/article-simple-rke2-longhorn-and-rancher-install#longhorn-gui
 * https://raw.githubusercontent.com/clemenko/k8s_yaml/master/flask_simple_nginx.yml

Blessedly the ingress just works. No idea what to do yet to make a service that presents itself on public IPs.


=== Try MetalLB ===

Holy crap I think I got it working.
 * We'll use it in L2 mode, no BGP yet
 * Set aside 192.168.1.57 - 192.168.1.63 for load balanced services
 * Install it via Rancher helm chart interface, no config
 * Push a simple address pool and advertisement config {{{#!yaml
---
# https://metallb.universe.tf/configuration/#layer-2-configuration

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: metallb-pool-1
  namespace: metallb-system
spec:
  addresses:
    - 192.168.1.57-192.168.1.63

---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: metallb-pool-1
  namespace: metallb-system
# Not needed because L2Advertisement claims all IPAddressPools by default
spec:
  ipAddressPools:
    - metallb-pool-1
}}}
 * Copy the existing redis service in the example, and add an external access route to it as a secondary service {{{#!yaml
---
apiVersion: v1
kind: Service
metadata:
  namespace: flask
  name: redis-ext
  labels:
    name: redis
    kubernetes.io/name: "redis"

spec:
  selector:
    app: redis
  ports:
    - name: redis
      protocol: TCP
      port: 6379
  type: LoadBalancer
}}}

It's really as simple as adding `type: LoadBalancer`, then MetalLB selects the next free IP itself and binds it.



== Hardware prep for the cluster nodes ==

Setup each new node like so:
 * [[servers/HardwarePrep/DellOptiplex9020Micro]]
 * [[servers/HardwarePrep/LenovoThinkCentreM710q]]
-Line 144:
+Line 278:
-I should ansible'ise everything, making minimal assumptions about the kickstart part of the process.
+This is getting everything to the state where I can bootstrap the cluster. I should ansible'ise everything, making minimal assumptions about the kickstart part of the process.
-Line 147:
+Line 281:
+I have a basic set of roles to get the nodes into a workable state, right before I invoke `kubeadm` for the first time.
{{{
---
- name: Configure persica k8s cluster
  hosts: persica
  roles:
    - role: common
      tags: common
    - role: docker_for_kube
      tags: docker_for_kube
    - role: kube_daemons
      tags: kube_daemons
}}}

=== Initialise the control plane ===

This is manual of course, no ansible here.

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#initializing-your-control-plane-node

 1. This will be a single-node control plane, but we should specify `--control-plane-endpoint` anyway. persica1 is going to be our control plane.
 2. Our Pod network add-on will be Flannel. We can specify `--pod-network-cidr` but I'll try without first.
 3. It'll detect containerd
 4. The default `--apiserver-advertise-address` will be fine, let it autodetect

I added a custom CNAME record to local pihole (calico) and Gandi (public service), for `persica-endpoint` => `persica1`. Unlike the DHCP stuff, this is in the general DNS web interface, not a custom config file.

After a bunch of faffing around to fix up the firewall config, bridge filtering kernel module, and enabling ipv4 forwarding, the init begins after passing preflight checks.

{{{
[root@persica1 ~]# kubeadm init --control-plane-endpoint=persica-endpoint
[init] Using Kubernetes version: v1.27.1
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
W0415 03:43:19.958609   39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
W0415 03:43:52.646765   39430 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image.
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local persica-endpoint persica1] and IPs [10.96.0.1 192.168.1.31]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
W0415 03:44:21.781505   39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0)
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
        timed out waiting for the condition

This error is likely caused by:
        - The kubelet is not running
        - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
        - 'systemctl status kubelet'
        - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
        Once you have found the failing container, you can inspect its logs with:
        - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
}}}

No worky :/

https://serverfault.com/questions/1116281/kubeadm-1-25-init-failed-on-debian-11-with-containerd-connection-refused

Maybe I need the control plane on a separate node after all. I'll try illustrious.

 * copy containerd/config.toml to illustrious
 * apt install -y apt-transport-https ca-certificates curl
 * curl -fsSLo /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
 * prep repo defn {{{
cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources
X-Repolib-Name: Kubernetes
Enabled: yes
Types: deb
URIs: https://apt.kubernetes.io/
Suites: kubernetes-xenial
Architectures: amd64
Components: main
Signed-By: /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg
X-Repolib-ID: Kubernetes
EOF
}}}
 * apt update
 * apt install -y kubelet kubeadm kubectl
 * apt-mark hold kubelet kubeadm kubectl

Now try kubeadm again.

----

Oh sonovabitch! Config not well described: https://github.com/containerd/containerd/issues/6964

Fixed config /etc/containerd/config.toml: {{{
version = 2

disabled_plugins = []

[plugins."io.containerd.grpc.v1.cri".containerd.runtimes]

  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
    base_runtime_spec = ""
    cni_conf_dir = ""
    cni_max_conf_num = 0
    container_annotations = []
    pod_annotations = []
    privileged_without_host_devices = false
    runtime_engine = ""
    runtime_path = ""
    runtime_root = ""
    runtime_type = "io.containerd.runc.v2"

  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    BinaryName = ""
    CriuImagePath = ""
    CriuPath = ""
    CriuWorkPath = ""
    IoGid = 0
    IoUid = 0
    NoNewKeyring = false
    NoPivotRoot = false
    Root = ""
    ShimCgroup = ""
    SystemdCgroup = true

# They suggest pinning this image, so we'll do that. This is the out-of-box default.
# https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd
[plugins."io.containerd.grpc.v1.cri"]
  sandbox_image = "registry.k8s.io/pause:3.9"
}}}

We could/should be using kubeadm init with a configuration file:
https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
{{{
Apr 15 04:48:26 illustrious.thighhighs.top systemd[1]: Started kubelet: The Kubernetes Node Agent.
Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.
Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
}}}

But screw that. Because guess what, it's also poorly documented!


=== Initialising the control plane now actually works ===

{{{
kubeadm init --control-plane-endpoint=persica-endpoint

Setup my `~/.kube/` config stuff as directed. Apparently this is an uber-superuser, so I shouldn't be using it regularly. Oh.

cat <<EOF > kubeconfig_example.yml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
# Will be used as the target "cluster" in the kubeconfig
clusterName: "persica"
# Will be used as the "server" (IP or DNS name) of this cluster in the kubeconfig
controlPlaneEndpoint: "persica-endpoint.thighhighs.top:6443"
# The cluster CA key and certificate will be loaded from this local directory
certificatesDir: "/etc/kubernetes/pki"
EOF

# on illustrious
kubeadm kubeconfig user --config kubeconfig_example.yml --client-name furinkan --validity-period 8760h
}}}

Now try adding a pod network. We'll use Flannel, and find the docs ourselves: https://github.com/flannel-io/flannel#deploying-flannel-manually

{{{
# from suomi
kubectl --context=persica-admin apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

kubectl --context=persica-admin get pods --all-namespaces
NAMESPACE      NAME                                                 READY   STATUS              RESTARTS      AGE
kube-flannel   kube-flannel-ds-zr6fb                                0/1     CrashLoopBackOff    1 (16s ago)   34s
kube-system    coredns-5d78c9869d-mp7p9                             0/1     ContainerCreating   0             66m
kube-system    coredns-5d78c9869d-tlsc6                             0/1     ContainerCreating   0             66m
kube-system    etcd-illustrious.thighhighs.top                      1/1     Running             1             66m
kube-system    kube-apiserver-illustrious.thighhighs.top            1/1     Running             1             66m
kube-system    kube-controller-manager-illustrious.thighhighs.top   1/1     Running             1             66m
kube-system    kube-proxy-5mntm                                     1/1     Running             0             66m
kube-system    kube-scheduler-illustrious.thighhighs.top            1/1     Running             1             66m
}}}

Doesn't work because we don't have the same podCIDR, and the default isn't compatible with whatever kubeadm does? FFS!

https://devops.stackexchange.com/questions/5898/how-to-get-kubernetes-pod-network-cidr

Okay so I can either nuke the cluster and reinstantiate it with podCIDR, or just reinstall the network plugin or something. Let's try the latter.
 * get the current podCIDR: https://devops.stackexchange.com/a/14867
 * kubeadm config print init-defaults | grep serviceSubnet
 * wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
 * Edit it
 * Reapply it? kubectl apply -f kube-flannel.yml
 * Is it still crashlooping? kubectl get pods --all-namespaces

Yeah.

=== Fukkit try again ===

{{{
# on illustrious
kubeadm reset
rm -rf /etc/cni/net.d/
rm -rf ~/.kube/

# fix the init: https://github.com/flannel-io/flannel/issues/728#issuecomment-308878912
kubeadm init --control-plane-endpoint=persica-endpoint.thighhighs.top --pod-network-cidr=10.244.0.0/16

# Fix up my kubectl creds again

# install flannel again
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml

# is it working now?
kubectl get pods --all-namespaces

# IT FUCKING WORKS!!
}}}

Now we join some worker nodes to the cluster, finally.

{{{
# on persica1
kubeadm join persica-endpoint.thighhighs.top:6443 --token FOO.FOOFOOFOO \
        --discovery-token-ca-cert-hash sha256:BARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBAR

[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
}}}

It's joined but apparently `NotReady`: {{{
root@illustrious:~# kubectl get nodes
NAME                         STATUS     ROLES           AGE    VERSION
illustrious.thighhighs.top   NotReady   control-plane   17m    v1.27.1
persica1                     NotReady   <none>          2m7s   v1.27.0
}}}

Apparently coredns won't start because of taints, as described here:
 * https://serverfault.com/questions/1064936/coredns-pods-stuck-in-pending-state
  * No explanation as to why the taints aren't going away
 * Similar problem here
  * Someone says to just restart containerd

Fuck yoooooouuu, now the coredns containers are running. I probably shouldn't have jumped the gun and joined all the worker nodes... I need to kick them so they start properly.
{{{
root@illustrious:~# kubectl get pods --all-namespaces
NAMESPACE      NAME                                                 READY   STATUS              RESTARTS   AGE
kube-flannel   kube-flannel-ds-4p4wd                                0/1     Init:0/2            0          21m
kube-flannel   kube-flannel-ds-6qfrm                                0/1     Init:0/2            0          12m
kube-flannel   kube-flannel-ds-kb94w                                0/1     Init:0/2            0          12m
kube-flannel   kube-flannel-ds-vctrt                                1/1     Running             0          30m
kube-system    coredns-5d78c9869d-dqnkh                             1/1     Running             0          36m
kube-system    coredns-5d78c9869d-rbmhm                             1/1     Running             0          36m
kube-system    etcd-illustrious.thighhighs.top                      1/1     Running             2          36m
kube-system    kube-apiserver-illustrious.thighhighs.top            1/1     Running             2          36m
kube-system    kube-controller-manager-illustrious.thighhighs.top   1/1     Running             0          36m
kube-system    kube-proxy-8dl56                                     0/1     ContainerCreating   0          12m
kube-system    kube-proxy-dppxt                                     0/1     ContainerCreating   0          21m
kube-system    kube-proxy-ljk6c                                     1/1     Running             0          36m
kube-system    kube-proxy-t7gcn                                     0/1     ContainerCreating   0          12m
kube-system    kube-scheduler-illustrious.thighhighs.top            1/1     Running             2          36m
}}}

Try deleting and re-adding a node. From https://stackoverflow.com/a/54220808/806927
{{{
# on illustrious
kubectl get nodes
kubectl drain persica1
kubectl drain persica1 --ignore-daemonsets --delete-local-data
kubectl delete node persica1

# on persica1
kubeadm reset

then join again
}}}

Looks like the kube-proxy is having trouble starting on persica1. And while it's only a warning, I bet it's more significant than that.
{{{
root@illustrious:~# kubectl get pods --all-namespaces
NAMESPACE      NAME                                                 READY   STATUS              RESTARTS   AGE
kube-flannel   kube-flannel-ds-gjq5h                                0/1     Init:0/2            0          3m33s
kube-flannel   kube-flannel-ds-vctrt                                1/1     Running             0          41m
kube-system    coredns-5d78c9869d-dqnkh                             1/1     Running             0          47m
kube-system    coredns-5d78c9869d-rbmhm                             1/1     Running             0          47m
kube-system    etcd-illustrious.thighhighs.top                      1/1     Running             2          47m
kube-system    kube-apiserver-illustrious.thighhighs.top            1/1     Running             2          47m
kube-system    kube-controller-manager-illustrious.thighhighs.top   1/1     Running             0          47m
kube-system    kube-proxy-ljk6c                                     1/1     Running             0          47m
kube-system    kube-proxy-xpv58                                     0/1     ContainerCreating   0          3m33s
kube-system    kube-scheduler-illustrious.thighhighs.top            1/1     Running             2          47m

root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-xpv58
4m29s       Normal    Scheduled                pod/kube-proxy-xpv58   Successfully assigned kube-system/kube-proxy-xpv58 to persica1
9s          Warning   FailedCreatePodSandBox   pod/kube-proxy-xpv58   Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory


# on persica1
mkdir /run/systemd/resolve
ln -s /etc/resolv.conf /run/systemd/resolve/resolv.conf

wtf now there's another error:

root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-grqhf
20s         Normal    Scheduled                pod/kube-proxy-grqhf   Successfully assigned kube-system/kube-proxy-grqhf to persica1
6s          Warning   FailedCreatePodSandBox   pod/kube-proxy-grqhf   Failed to create pod sandbox: rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument
}}}

I think I haven't deployed a good containerd config everywhere yet. Deployed that, and suddenly the damn kube-proxy and kube-flannel containers are working.

Now I can add the other two nodes, still need to fix the resolv.conf manually.
{{{
root@illustrious:~# kubectl get nodes -o wide
NAME                         STATUS   ROLES           AGE     VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                    KERNEL-VERSION                CONTAINER-RUNTIME
illustrious.thighhighs.top   Ready    control-plane   78m     v1.27.1   192.168.1.12   <none>        Ubuntu 22.04.2 LTS          5.15.0-69-generic             containerd://1.6.20
persica1                     Ready    <none>          21m     v1.27.0   192.168.1.31   <none>        AlmaLinux 9.1 (Lime Lynx)   5.14.0-162.6.1.el9_1.x86_64   containerd://1.6.20
persica2                     Ready    <none>          2m41s   v1.27.0   192.168.1.32   <none>        AlmaLinux 9.1 (Lime Lynx)   5.14.0-162.6.1.el9_1.x86_64   containerd://1.6.20
persica3                     Ready    <none>          33s     v1.27.0   192.168.1.33   <none>        AlmaLinux 9.1 (Lime Lynx)   5.14.0-162.6.1.el9_1.x86_64   containerd://1.6.20
}}}
Good enough for now!


== Making ingress work ==

I don't understand this well enough, but I want to use ingress-nginx. Here's a page about it, albeit not using raw kubectl: https://kubernetes.github.io/ingress-nginx/kubectl-plugin/

Maybe this one too: https://medium.com/tektutor/using-nginx-ingress-controller-in-kubernetes-bare-metal-setup-890eb4e7772

== Making load balancing work ==

I thought I wouldn't need it, but it looks like I do, if I want sensible useful functionality. Here's an explanation of why I want to use Metal LB, and it's not just for BGP-based configs: https://github.com/kubernetes/ingress-nginx/blob/main/docs/deploy/baremetal.md

I'll use it in L2 mode with ARP/NDP I think. Just need to dedicate a bunch of IPs to it so it can manage the traffic to them.

Useful(?) links

Navigation