2547
Comment: k8s doesn't like having swap enabled
|
35358
more issues but it works now!
|
Deletions are marked like this. | Additions are marked like this. |
Line 11: | Line 11: |
{{{#!wiki note I last touched this in April 2023 and it was very annoying to get as far as I did. Next time I look at it, I think I will rebuild the cluster from scratch again, and use a different guide. Something with actual explanations and a few opinions, like this one: https://github.com/hobby-kube/guide }}} |
|
Line 12: | Line 16: |
== Another rebuild attempt in late 2023 == A few changes for this one: * I'm going to use Rancher this time, or that guide linked above * Alma 9.2 because it's the latest * Move them to the "subnet" of 192.168.1.32/29 so I can configure the router to give them DHCP options easily * persica1 / 192.168.1.33 * persica2 / 192.168.1.34 * persica3 / 192.168.1.35 * Put the controller node onto vector rather than illustrious, which in this case might be the rancher docker container * vector / 192.168.1.32 (should probably be a static IP) * persica / CNAME to vector * Go with Longhorn for PVCs * Dunno what to do about ingress yet === Prepare vector controller node === Build the base OS as per notes on [[servers/vector#Build_notes]] Copy root's ssh pubkey to illustrious === TFTP server === 1. Install the daemon {{{ apt install -y tftpd-hpa }}} 1. Copy your stuff into `/src/tftp` {{{ rsync -avx root@illustrious:/srv/tftp/ /srv/tftp/ }}} === HTTP server === We need to serve the kickstart files via HTTP. 1. Install package {{{ apt install -y micro-httpd }}} 1. '''SKIP THIS IT CAN KEEP PORT 80''' Configure it to listen on port 8080 instead, so that Rancher can have port 80 {{{ systemctl edit micro-httpd.socket # Put this is there when prompted [Socket] ListenStream= ListenStream=0.0.0.0:8080 # Just to be sure systemctl restart micro-httpd.socket }}} 1. Create the httpd docroot {{{ /var/www/html/ks }}} 1. Copy the kickstart files in there {{{ rsync -avx illustrious:/data/www/illustrious/ks/ /var/www/html/ks/ }}} === Prepare to run rancher === Install docker engine: https://docs.docker.com/engine/install/debian/#install-using-the-repository {{{ curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg chmod 0644 /etc/apt/keyrings/docker.gpg cat <<EOF > /etc/apt/sources.list.d/docker.sources X-Repolib-Name: Docker Enabled: yes Types: deb URIs: https://download.docker.com/linux/debian Suites: bookworm Architectures: arm64 Components: stable Signed-By: /etc/apt/keyrings/docker.gpg EOF apt update # Find versions apt-cache madison docker-ce | awk '{ print $3 }' # Install desired VERSION_STRING="5:23.0.6-1~debian.12~bookworm" apt install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io cgroupfs-mount docker run hello-world }}} Prep certs: https://ranchermanager.docs.rancher.com/pages-for-subheaders/rancher-on-a-single-node-with-docker#option-c-bring-your-own-certificate-signed-by-a-recognized-ca {{{ rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.key /etc/ssl/ rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.crtbundled /etc/ssl/ rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.key.2023 /etc/ssl/ rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.crtbundled.2023 /etc/ssl/ chown root:root /etc/ssl/STAR_thighhighs_top.* }}} Fix the bloody cgroups because this OS is special, append these options to the `/boot/cmdline.txt` {{{ cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory systemd.unified_cgroup_hierarchy=false systemd.legacy_systemd_cgroup_controller=false }}} '''Reboot now''' for the cgroup stuff to take effect. === Run Rancher === Run rancher container according to this note about using ARM systems, it just tells you to specify an exact version so you know it's built with arm64 support: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64 {{{ docker run -d --restart=unless-stopped \ -p 443:443 \ -v /etc/ssl/STAR_thighhighs_top.crtbundled:/etc/rancher/ssl/cert.pem \ -v /etc/ssl/STAR_thighhighs_top.key:/etc/rancher/ssl/key.pem \ --privileged \ rancher/rancher:v2.7.9 \ --no-cacerts }}} It'll take some time to start. Then you can try hitting the Rancher web UI: https://vector.thighhighs.top/ Login with the local user password as directed, then let it set the new admin password. Record it somewhere safe, and set the server URL to https://persica.thighhighs.top === Stand up the cluster === https://ranchermanager.docs.rancher.com/pages-for-subheaders/use-existing-nodes Login to each persica node and add root@vector's ssh pubkey to the `authorized_keys` Create cluster in rancher, select RKE1, leave options as default, tick the boxes and find the command to run on each node. docker is already installed from my last attempt, try to get it going. {{{ systemctl enable docker.service --now docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.7.9 --server https://persica.thighhighs.top --token lx5qjbl4dn7zkpbmt5qqz8qfdvtgsl2x5ft95j8lh785bxrjjccq2t --etcd --controlplane --worker docker logs recursing_proskuriakova -f }}} Run this on each node to onboard it to the cluster. Now whyTF can't persica2 and persica3 contact services on persica1..? Aha, firewalld is running on persica1, and it shouldn't be. Need to disable it on all three nodes. {{{ systemctl disable firewalld.service --now }}} Find that it doesn't work and you can't make it work. Tear it all down and start again, killing every container, nuking files, and starting from scratch: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/clean-cluster-nodes#directories-and-files Eventually, you get a cluster with three working nodes in it!! === Previous Rancher issues on asval === Fucking cgroups, k3s dies instantly: * https://github.com/rancher/rancher/issues/35201#issuecomment-947331154 * https://groups.google.com/g/linux.debian.bugs.dist/c/Z-Cc0WmlEGA/m/NB6XGDsnAwAJ {{{ EDITOR=vim systemctl edit docker.service [Service] ExecStart= ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=cgroupfs }}} Real answer here: https://github.com/rancher/rancher/issues/36165 {{{ systemctl revert docker.service append to /boot/cmdline.txt: cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory systemd.unified_cgroup_hierarchy=false systemd.legacy_systemd_cgroup_controller=false }}} and reboot Eugh still not running: {{{ 2023/11/08 16:56:05 [INFO] Waiting for server to become available: the server is currently unable to handle the request 2023/11/08 16:56:07 [INFO] Waiting for server to become available: the server is currently unable to handle the request 2023/11/08 16:56:19 [INFO] Running in single server mode, will not peer connections 2023/11/08 16:56:19 [INFO] Applying CRD features.management.cattle.io 2023/11/08 16:56:20 [INFO] Waiting for CRD features.management.cattle.io to become available 2023/11/08 16:56:20 [INFO] Done waiting for CRD features.management.cattle.io to become available 2023/11/08 16:56:21 [FATAL] k3s exited with: exit status 1 root@asval:~# less -S k3s.log I1108 16:56:20.763603 52 cpu_manager.go:214] "Starting CPU manager" policy="none" I1108 16:56:20.763761 52 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s" I1108 16:56:20.763895 52 state_mem.go:36] "Initialized new in-memory state store" I1108 16:56:20.784356 52 policy_none.go:49] "None policy: Start" E1108 16:56:20.796489 52 nodelease.go:49] "Failed to get node when trying to set owner ref to the node lease" err="nodes \"local-node\" not found" node="local-node" I1108 16:56:20.804858 52 memory_manager.go:169] "Starting memorymanager" policy="None" I1108 16:56:20.809257 52 state_mem.go:35] "Initializing new in-memory state store" I1108 16:56:20.846809 52 kubelet_node_status.go:73] "Successfully registered node" node="local-node" E1108 16:56:20.866904 52 kubelet.go:1466] "Failed to start ContainerManager" err="failed to get rootfs info: unable to find data in memory cache" }}} Aaaand this might be failing because asval only has 1GB RAM, gdi. Might need to redo this from scratch using [[servers/vector]], I think she has 4GB RAM. === Build the nodes === This just works now, huzzah! Manually kick the BIOS of each node to do a one-time PXE boot, then let it do its thing. === Node configs === Salvage my old ansible playbook stuff and copy it to asval. Run it from there. {{{ apt install -y ansible sshpass cd ~/git/persica-ansible/ make persica ARGS="-C --tags common" }}} |
|
Line 30: | Line 249: |
* Full UEFI mode * PXE boot for kickstart install * tftpd-hpa running on illustrious |
=== Per node === * Update the BIOS using this guide: https://www.dell.com/support/kbdoc/en-au/000131486/update-the-dell-bios-in-a-linux-or-ubuntu-environment#updatebios2015 * Despite the usual Dell docs saying you need to make a DOS boot disk and run the flash updater app from there, it turns out that the BIOS Flash Update target (mash F12 to get the one-time boot menu) can read the `9020MA19.exe` file from a FAT32 filesystem on a USB stick just fine * Not sure if this only works in UEFI mode or not, but I kinda don't care because we ''want'' to be in UEFI mode * This applies to systems made from 2015 or later * The latest BIOS update for the Optiplex 9020M is version A19, released * Set BIOS to full UEFI mode, no legacy * We'll be using DHCP, so find the MAC address so we can give it a consistent IP address when it boots * Add the MAC address and IP assignment to dnsmasq on calico (a pihole box) * `/etc/dnsmasq.d/02-pihole-dhcp-persica-cluster.conf` * Something like this {{{ dhcp-host=98:90:96:BE:89:52,set:persica,192.168.1.31,persica1,5m # one dhcp-host line per host dhcp-boot=tag:persica,grub/grubx64.efi,illustrious.thighhighs.top,192.168.1.12 }}} * Run `pihole restartdns` after making changes * PXE boot for kickstart install, which will hit calico for DHCP, then illustrious for the boot image and kickstart config * tftpd-hpa is running on illustrious |
Line 34: | Line 269: |
* Drop that content in `/srv/tftp/` {{{ root@illustrious:/srv/tftp# tree . ├── BOOTX64.EFI ├── default.efi ├── grub │ ├── grub.cfg │ ├── grub.cfg-01-98-90-96-be-89-52 │ └── grubx64.efi ├── images │ └── Alma-9.1 │ ├── initrd.img │ └── vmlinuz ├── ipxe.efi └── shimx64.efi }}} * Add a grub config fragment for the host's MAC address: `grub.cfg-01-xx-xx-xx-xx-xx-xx` * Make sure the grub config has the correct URL for its kickstart config |
|
Line 35: | Line 288: |
* Make sure your per-host config file has the correct name | |
Line 40: | Line 294: |
This was useful for figuring out the TFTP stuff for the first time. Paths are hardcoded into the `grubx64.efi` binary, meaning HDD and PXE versions aren't the same. Make sure you put all the grub stuff in a `grub/` directory. Check the `$prefix` to see where it's searching: https://askubuntu.com/questions/1183487/grub2-efi-boot-via-pxe-load-config-file-automatically I should ansible'ise everything. Can I start with this? {{{ AlmaLinux 9 - AppStream 3.0 MB/s | 3.1 kB 00:00 Importing GPG key 0xB86B3716: Userid : "AlmaLinux OS 9 <packager@almalinux.org>" Fingerprint: BF18 AC28 7617 8908 D6E7 1267 D36C B86C B86B 3716 From : /etc/pki/rpm-gpg/RPM-GPG-KEY-AlmaLinux-9 Is this ok [y/N]: y Key imported successfully }}} |
This was useful for figuring out the TFTP stuff for the first time: https://askubuntu.com/questions/1183487/grub2-efi-boot-via-pxe-load-config-file-automatically Paths are hardcoded into the `grubx64.efi` binary, meaning HDD and PXE versions aren't the same. Make sure you put all the grub stuff in a `grub/` directory. Check the `$prefix` to see where it's searching: === UEFI settings === Get to the UEFI * Probably get stuck in windows for first boot * Win, then "UEFI", get to advanced startup options * Boot with Advanced Boot Options * Troubleshoot, Advanced Options, UEFI Firmware Settings, Restart Record details * Get the LOM MAC Address from Settings, General, System Info Change settings * General * Boot Sequence * Select UEFI boot list * Advanced Boot Options * Disable Legacy OPROMs * UEFI Boot Path Security * Set to Never * Date/Time * Set clock to approx correct for UTC time * System Configuration * Integrated NIC * Enable UEFI Network Stack * Enabled w/ PXE * SATA Operation * AHCI * SMART Reporting * Disabled, we don't need it * Audio * Disable all audio, we don't need it * Security * TPM Security * Check everything except Clear * Activated * CPU XD support * Enabled * Secure Boot * Secure Boot Enable * Disabled * Performance * Multi-core support: All * Speedstep: Enabled * C-states: Enabled * Limit CPUID: Disabled * Turboboost: Enabled * Power Management * AC Recovery: Power On * Deep Sleep Control: Disabled * USB Wake Support: Enable USB wake from Standby * Wake on LAN/WLAN: LAN with PXE Boot * Block Sleep: Enable blocking of sleep * POST Behaviour * Keyboard Errors: Disable error detection * Virtualisation support * Enable VT * Enable VT-d * Enable Trusted Execution Reboot and go back in again. * Boot only from IPv4 with NIC (PXE boot) === Ansible management after kickstart build === This is getting everything to the state where I can bootstrap the cluster. I should ansible'ise everything, making minimal assumptions about the kickstart part of the process. I'm keeping a simple ansible repo in `~/git/persica-ansible/` I have a basic set of roles to get the nodes into a workable state, right before I invoke `kubeadm` for the first time. {{{ --- - name: Configure persica k8s cluster hosts: persica roles: - role: common tags: common - role: docker_for_kube tags: docker_for_kube - role: kube_daemons tags: kube_daemons }}} === Initialise the control plane === This is manual of course, no ansible here. https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#initializing-your-control-plane-node 1. This will be a single-node control plane, but we should specify `--control-plane-endpoint` anyway. persica1 is going to be our control plane. 2. Our Pod network add-on will be Flannel. We can specify `--pod-network-cidr` but I'll try without first. 3. It'll detect containerd 4. The default `--apiserver-advertise-address` will be fine, let it autodetect I added a custom CNAME record to local pihole (calico) and Gandi (public service), for `persica-endpoint` => `persica1`. Unlike the DHCP stuff, this is in the general DNS web interface, not a custom config file. After a bunch of faffing around to fix up the firewall config, bridge filtering kernel module, and enabling ipv4 forwarding, the init begins after passing preflight checks. {{{ [root@persica1 ~]# kubeadm init --control-plane-endpoint=persica-endpoint [init] Using Kubernetes version: v1.27.1 [preflight] Running pre-flight checks [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' W0415 03:43:19.958609 39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0) W0415 03:43:52.646765 39430 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image. [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local persica-endpoint persica1] and IPs [10.96.0.1 192.168.1.31] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" W0415 03:44:21.781505 39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0) [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all running Kubernetes containers by using crictl: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster To see the stack trace of this error execute with --v=5 or higher }}} No worky :/ https://serverfault.com/questions/1116281/kubeadm-1-25-init-failed-on-debian-11-with-containerd-connection-refused Maybe I need the control plane on a separate node after all. I'll try illustrious. * copy containerd/config.toml to illustrious * apt install -y apt-transport-https ca-certificates curl * curl -fsSLo /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg * prep repo defn {{{ cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources X-Repolib-Name: Kubernetes Enabled: yes Types: deb URIs: https://apt.kubernetes.io/ Suites: kubernetes-xenial Architectures: amd64 Components: main Signed-By: /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg X-Repolib-ID: Kubernetes EOF }}} * apt update * apt install -y kubelet kubeadm kubectl * apt-mark hold kubelet kubeadm kubectl Now try kubeadm again. ---- Oh sonovabitch! Config not well described: https://github.com/containerd/containerd/issues/6964 Fixed config /etc/containerd/config.toml: {{{ version = 2 disabled_plugins = [] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] BinaryName = "" CriuImagePath = "" CriuPath = "" CriuWorkPath = "" IoGid = 0 IoUid = 0 NoNewKeyring = false NoPivotRoot = false Root = "" ShimCgroup = "" SystemdCgroup = true # They suggest pinning this image, so we'll do that. This is the out-of-box default. # https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd [plugins."io.containerd.grpc.v1.cri"] sandbox_image = "registry.k8s.io/pause:3.9" }}} We could/should be using kubeadm init with a configuration file: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ {{{ Apr 15 04:48:26 illustrious.thighhighs.top systemd[1]: Started kubelet: The Kubernetes Node Agent. Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI. }}} But screw that. Because guess what, it's also poorly documented! === Initialising the control plane now actually works === {{{ kubeadm init --control-plane-endpoint=persica-endpoint Setup my `~/.kube/` config stuff as directed. Apparently this is an uber-superuser, so I shouldn't be using it regularly. Oh. cat <<EOF > kubeconfig_example.yml apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration # Will be used as the target "cluster" in the kubeconfig clusterName: "persica" # Will be used as the "server" (IP or DNS name) of this cluster in the kubeconfig controlPlaneEndpoint: "persica-endpoint.thighhighs.top:6443" # The cluster CA key and certificate will be loaded from this local directory certificatesDir: "/etc/kubernetes/pki" EOF # on illustrious kubeadm kubeconfig user --config kubeconfig_example.yml --client-name furinkan --validity-period 8760h }}} Now try adding a pod network. We'll use Flannel, and find the docs ourselves: https://github.com/flannel-io/flannel#deploying-flannel-manually {{{ # from suomi kubectl --context=persica-admin apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml kubectl --context=persica-admin get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-zr6fb 0/1 CrashLoopBackOff 1 (16s ago) 34s kube-system coredns-5d78c9869d-mp7p9 0/1 ContainerCreating 0 66m kube-system coredns-5d78c9869d-tlsc6 0/1 ContainerCreating 0 66m kube-system etcd-illustrious.thighhighs.top 1/1 Running 1 66m kube-system kube-apiserver-illustrious.thighhighs.top 1/1 Running 1 66m kube-system kube-controller-manager-illustrious.thighhighs.top 1/1 Running 1 66m kube-system kube-proxy-5mntm 1/1 Running 0 66m kube-system kube-scheduler-illustrious.thighhighs.top 1/1 Running 1 66m }}} Doesn't work because we don't have the same podCIDR, and the default isn't compatible with whatever kubeadm does? FFS! https://devops.stackexchange.com/questions/5898/how-to-get-kubernetes-pod-network-cidr Okay so I can either nuke the cluster and reinstantiate it with podCIDR, or just reinstall the network plugin or something. Let's try the latter. * get the current podCIDR: https://devops.stackexchange.com/a/14867 * kubeadm config print init-defaults | grep serviceSubnet * wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml * Edit it * Reapply it? kubectl apply -f kube-flannel.yml * Is it still crashlooping? kubectl get pods --all-namespaces Yeah. === Fukkit try again === {{{ # on illustrious kubeadm reset rm -rf /etc/cni/net.d/ rm -rf ~/.kube/ # fix the init: https://github.com/flannel-io/flannel/issues/728#issuecomment-308878912 kubeadm init --control-plane-endpoint=persica-endpoint.thighhighs.top --pod-network-cidr=10.244.0.0/16 # Fix up my kubectl creds again # install flannel again kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml # is it working now? kubectl get pods --all-namespaces # IT FUCKING WORKS!! }}} Now we join some worker nodes to the cluster, finally. {{{ # on persica1 kubeadm join persica-endpoint.thighhighs.top:6443 --token FOO.FOOFOOFOO \ --discovery-token-ca-cert-hash sha256:BARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBAR [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster. }}} It's joined but apparently `NotReady`: {{{ root@illustrious:~# kubectl get nodes NAME STATUS ROLES AGE VERSION illustrious.thighhighs.top NotReady control-plane 17m v1.27.1 persica1 NotReady <none> 2m7s v1.27.0 }}} Apparently coredns won't start because of taints, as described here: * https://serverfault.com/questions/1064936/coredns-pods-stuck-in-pending-state * No explanation as to why the taints aren't going away * Similar problem here * Someone says to just restart containerd Fuck yoooooouuu, now the coredns containers are running. I probably shouldn't have jumped the gun and joined all the worker nodes... I need to kick them so they start properly. {{{ root@illustrious:~# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-4p4wd 0/1 Init:0/2 0 21m kube-flannel kube-flannel-ds-6qfrm 0/1 Init:0/2 0 12m kube-flannel kube-flannel-ds-kb94w 0/1 Init:0/2 0 12m kube-flannel kube-flannel-ds-vctrt 1/1 Running 0 30m kube-system coredns-5d78c9869d-dqnkh 1/1 Running 0 36m kube-system coredns-5d78c9869d-rbmhm 1/1 Running 0 36m kube-system etcd-illustrious.thighhighs.top 1/1 Running 2 36m kube-system kube-apiserver-illustrious.thighhighs.top 1/1 Running 2 36m kube-system kube-controller-manager-illustrious.thighhighs.top 1/1 Running 0 36m kube-system kube-proxy-8dl56 0/1 ContainerCreating 0 12m kube-system kube-proxy-dppxt 0/1 ContainerCreating 0 21m kube-system kube-proxy-ljk6c 1/1 Running 0 36m kube-system kube-proxy-t7gcn 0/1 ContainerCreating 0 12m kube-system kube-scheduler-illustrious.thighhighs.top 1/1 Running 2 36m }}} Try deleting and re-adding a node. From https://stackoverflow.com/a/54220808/806927 {{{ # on illustrious kubectl get nodes kubectl drain persica1 kubectl drain persica1 --ignore-daemonsets --delete-local-data kubectl delete node persica1 # on persica1 kubeadm reset then join again }}} Looks like the kube-proxy is having trouble starting on persica1. And while it's only a warning, I bet it's more significant than that. {{{ root@illustrious:~# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-gjq5h 0/1 Init:0/2 0 3m33s kube-flannel kube-flannel-ds-vctrt 1/1 Running 0 41m kube-system coredns-5d78c9869d-dqnkh 1/1 Running 0 47m kube-system coredns-5d78c9869d-rbmhm 1/1 Running 0 47m kube-system etcd-illustrious.thighhighs.top 1/1 Running 2 47m kube-system kube-apiserver-illustrious.thighhighs.top 1/1 Running 2 47m kube-system kube-controller-manager-illustrious.thighhighs.top 1/1 Running 0 47m kube-system kube-proxy-ljk6c 1/1 Running 0 47m kube-system kube-proxy-xpv58 0/1 ContainerCreating 0 3m33s kube-system kube-scheduler-illustrious.thighhighs.top 1/1 Running 2 47m root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-xpv58 4m29s Normal Scheduled pod/kube-proxy-xpv58 Successfully assigned kube-system/kube-proxy-xpv58 to persica1 9s Warning FailedCreatePodSandBox pod/kube-proxy-xpv58 Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory # on persica1 mkdir /run/systemd/resolve ln -s /etc/resolv.conf /run/systemd/resolve/resolv.conf wtf now there's another error: root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-grqhf 20s Normal Scheduled pod/kube-proxy-grqhf Successfully assigned kube-system/kube-proxy-grqhf to persica1 6s Warning FailedCreatePodSandBox pod/kube-proxy-grqhf Failed to create pod sandbox: rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument }}} I think I haven't deployed a good containerd config everywhere yet. Deployed that, and suddenly the damn kube-proxy and kube-flannel containers are working. Now I can add the other two nodes, still need to fix the resolv.conf manually. {{{ root@illustrious:~# kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME illustrious.thighhighs.top Ready control-plane 78m v1.27.1 192.168.1.12 <none> Ubuntu 22.04.2 LTS 5.15.0-69-generic containerd://1.6.20 persica1 Ready <none> 21m v1.27.0 192.168.1.31 <none> AlmaLinux 9.1 (Lime Lynx) 5.14.0-162.6.1.el9_1.x86_64 containerd://1.6.20 persica2 Ready <none> 2m41s v1.27.0 192.168.1.32 <none> AlmaLinux 9.1 (Lime Lynx) 5.14.0-162.6.1.el9_1.x86_64 containerd://1.6.20 persica3 Ready <none> 33s v1.27.0 192.168.1.33 <none> AlmaLinux 9.1 (Lime Lynx) 5.14.0-162.6.1.el9_1.x86_64 containerd://1.6.20 }}} Good enough for now! == Making ingress work == I don't understand this well enough, but I want to use ingress-nginx. Here's a page about it, albeit not using raw kubectl: https://kubernetes.github.io/ingress-nginx/kubectl-plugin/ Maybe this one too: https://medium.com/tektutor/using-nginx-ingress-controller-in-kubernetes-bare-metal-setup-890eb4e7772 == Making load balancing work == I thought I wouldn't need it, but it looks like I do, if I want sensible useful functionality. Here's an explanation of why I want to use Metal LB, and it's not just for BGP-based configs: https://github.com/kubernetes/ingress-nginx/blob/main/docs/deploy/baremetal.md I'll use it in L2 mode with ARP/NDP I think. Just need to dedicate a bunch of IPs to it so it can manage the traffic to them. |
persica cluster
This is a cluster of three identical nodes, named persica1/2/3
- Alma Linux 9.1 x64
- Dell Optiplex 9020 Micro
- Intel Core i5-4590T @ 2.00 GHz
- 16gb DDR3-1600
- 128gb SSD
I last touched this in April 2023 and it was very annoying to get as far as I did. Next time I look at it, I think I will rebuild the cluster from scratch again, and use a different guide. Something with actual explanations and a few opinions, like this one: https://github.com/hobby-kube/guide
Contents
Another rebuild attempt in late 2023
A few changes for this one:
- I'm going to use Rancher this time, or that guide linked above
- Alma 9.2 because it's the latest
- Move them to the "subnet" of 192.168.1.32/29 so I can configure the router to give them DHCP options easily
- persica1 / 192.168.1.33
- persica2 / 192.168.1.34
- persica3 / 192.168.1.35
- Put the controller node onto vector rather than illustrious, which in this case might be the rancher docker container
- vector / 192.168.1.32 (should probably be a static IP)
- persica / CNAME to vector
- Go with Longhorn for PVCs
- Dunno what to do about ingress yet
Prepare vector controller node
Build the base OS as per notes on servers/vector#Build_notes
Copy root's ssh pubkey to illustrious
TFTP server
Install the daemon
apt install -y tftpd-hpa
Copy your stuff into /src/tftp
rsync -avx root@illustrious:/srv/tftp/ /srv/tftp/
HTTP server
We need to serve the kickstart files via HTTP.
Install package
apt install -y micro-httpd
SKIP THIS IT CAN KEEP PORT 80 Configure it to listen on port 8080 instead, so that Rancher can have port 80
systemctl edit micro-httpd.socket # Put this is there when prompted [Socket] ListenStream= ListenStream=0.0.0.0:8080 # Just to be sure systemctl restart micro-httpd.socket
Create the httpd docroot
/var/www/html/ks
Copy the kickstart files in there
rsync -avx illustrious:/data/www/illustrious/ks/ /var/www/html/ks/
Prepare to run rancher
Install docker engine: https://docs.docker.com/engine/install/debian/#install-using-the-repository
curl -fsSL https://download.docker.com/linux/debian/gpg | gpg --dearmor -o /etc/apt/keyrings/docker.gpg chmod 0644 /etc/apt/keyrings/docker.gpg cat <<EOF > /etc/apt/sources.list.d/docker.sources X-Repolib-Name: Docker Enabled: yes Types: deb URIs: https://download.docker.com/linux/debian Suites: bookworm Architectures: arm64 Components: stable Signed-By: /etc/apt/keyrings/docker.gpg EOF apt update # Find versions apt-cache madison docker-ce | awk '{ print $3 }' # Install desired VERSION_STRING="5:23.0.6-1~debian.12~bookworm" apt install docker-ce=$VERSION_STRING docker-ce-cli=$VERSION_STRING containerd.io cgroupfs-mount docker run hello-world
rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.key /etc/ssl/ rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.crtbundled /etc/ssl/ rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.key.2023 /etc/ssl/ rsync -avx root@illustrious:/etc/ssl/STAR_thighhighs_top.crtbundled.2023 /etc/ssl/ chown root:root /etc/ssl/STAR_thighhighs_top.*
Fix the bloody cgroups because this OS is special, append these options to the /boot/cmdline.txt
cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory systemd.unified_cgroup_hierarchy=false systemd.legacy_systemd_cgroup_controller=false
Reboot now for the cgroup stuff to take effect.
Run Rancher
Run rancher container according to this note about using ARM systems, it just tells you to specify an exact version so you know it's built with arm64 support: https://ranchermanager.docs.rancher.com/how-to-guides/advanced-user-guides/enable-experimental-features/rancher-on-arm64
docker run -d --restart=unless-stopped \ -p 443:443 \ -v /etc/ssl/STAR_thighhighs_top.crtbundled:/etc/rancher/ssl/cert.pem \ -v /etc/ssl/STAR_thighhighs_top.key:/etc/rancher/ssl/key.pem \ --privileged \ rancher/rancher:v2.7.9 \ --no-cacerts
It'll take some time to start. Then you can try hitting the Rancher web UI: https://vector.thighhighs.top/
Login with the local user password as directed, then let it set the new admin password. Record it somewhere safe, and set the server URL to https://persica.thighhighs.top
Stand up the cluster
https://ranchermanager.docs.rancher.com/pages-for-subheaders/use-existing-nodes
Login to each persica node and add root@vector's ssh pubkey to the authorized_keys
Create cluster in rancher, select RKE1, leave options as default, tick the boxes and find the command to run on each node.
docker is already installed from my last attempt, try to get it going.
systemctl enable docker.service --now docker run -d --privileged --restart=unless-stopped --net=host -v /etc/kubernetes:/etc/kubernetes -v /var/run:/var/run rancher/rancher-agent:v2.7.9 --server https://persica.thighhighs.top --token lx5qjbl4dn7zkpbmt5qqz8qfdvtgsl2x5ft95j8lh785bxrjjccq2t --etcd --controlplane --worker docker logs recursing_proskuriakova -f
Run this on each node to onboard it to the cluster.
Now whyTF can't persica2 and persica3 contact services on persica1..? Aha, firewalld is running on persica1, and it shouldn't be. Need to disable it on all three nodes.
systemctl disable firewalld.service --now
Find that it doesn't work and you can't make it work. Tear it all down and start again, killing every container, nuking files, and starting from scratch: https://ranchermanager.docs.rancher.com/how-to-guides/new-user-guides/manage-clusters/clean-cluster-nodes#directories-and-files
Eventually, you get a cluster with three working nodes in it!!
Previous Rancher issues on asval
Fucking cgroups, k3s dies instantly:
https://github.com/rancher/rancher/issues/35201#issuecomment-947331154
https://groups.google.com/g/linux.debian.bugs.dist/c/Z-Cc0WmlEGA/m/NB6XGDsnAwAJ
EDITOR=vim systemctl edit docker.service [Service] ExecStart= ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock --exec-opt native.cgroupdriver=cgroupfs
Real answer here: https://github.com/rancher/rancher/issues/36165
systemctl revert docker.service append to /boot/cmdline.txt: cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory systemd.unified_cgroup_hierarchy=false systemd.legacy_systemd_cgroup_controller=false
and reboot
Eugh still not running:
2023/11/08 16:56:05 [INFO] Waiting for server to become available: the server is currently unable to handle the request 2023/11/08 16:56:07 [INFO] Waiting for server to become available: the server is currently unable to handle the request 2023/11/08 16:56:19 [INFO] Running in single server mode, will not peer connections 2023/11/08 16:56:19 [INFO] Applying CRD features.management.cattle.io 2023/11/08 16:56:20 [INFO] Waiting for CRD features.management.cattle.io to become available 2023/11/08 16:56:20 [INFO] Done waiting for CRD features.management.cattle.io to become available 2023/11/08 16:56:21 [FATAL] k3s exited with: exit status 1 root@asval:~# less -S k3s.log I1108 16:56:20.763603 52 cpu_manager.go:214] "Starting CPU manager" policy="none" I1108 16:56:20.763761 52 cpu_manager.go:215] "Reconciling" reconcilePeriod="10s" I1108 16:56:20.763895 52 state_mem.go:36] "Initialized new in-memory state store" I1108 16:56:20.784356 52 policy_none.go:49] "None policy: Start" E1108 16:56:20.796489 52 nodelease.go:49] "Failed to get node when trying to set owner ref to the node lease" err="nodes \"local-node\" not found" node="local-node" I1108 16:56:20.804858 52 memory_manager.go:169] "Starting memorymanager" policy="None" I1108 16:56:20.809257 52 state_mem.go:35] "Initializing new in-memory state store" I1108 16:56:20.846809 52 kubelet_node_status.go:73] "Successfully registered node" node="local-node" E1108 16:56:20.866904 52 kubelet.go:1466] "Failed to start ContainerManager" err="failed to get rootfs info: unable to find data in memory cache"
Aaaand this might be failing because asval only has 1GB RAM, gdi. Might need to redo this from scratch using servers/vector, I think she has 4GB RAM.
Build the nodes
This just works now, huzzah! Manually kick the BIOS of each node to do a one-time PXE boot, then let it do its thing.
Node configs
Salvage my old ansible playbook stuff and copy it to asval. Run it from there.
apt install -y ansible sshpass cd ~/git/persica-ansible/ make persica ARGS="-C --tags common"
k8s notes
- Make a simple 3-node cluster
- Single-node control plane will run externally, on illustrious
Use kubeadm to build the cluster: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/install-kubeadm/
- Selected containerd as the container runtime
- Will use Flannel as the networking plugin
- Allocated IPs:
- persica1 / 192.168.1.31
- persica2 / 192.168.1.32
- persica3 / 192.168.1.33
- Ingress: undecided so far
- Cgroup driver: let's use systemd
- k8s version: whatever is latest right now (2023-04-04)
Build notes
Per node
Update the BIOS using this guide: https://www.dell.com/support/kbdoc/en-au/000131486/update-the-dell-bios-in-a-linux-or-ubuntu-environment#updatebios2015
Despite the usual Dell docs saying you need to make a DOS boot disk and run the flash updater app from there, it turns out that the BIOS Flash Update target (mash F12 to get the one-time boot menu) can read the 9020MA19.exe file from a FAT32 filesystem on a USB stick just fine
Not sure if this only works in UEFI mode or not, but I kinda don't care because we want to be in UEFI mode
- This applies to systems made from 2015 or later
- The latest BIOS update for the Optiplex 9020M is version A19, released
- Set BIOS to full UEFI mode, no legacy
- We'll be using DHCP, so find the MAC address so we can give it a consistent IP address when it boots
- Add the MAC address and IP assignment to dnsmasq on calico (a pihole box)
/etc/dnsmasq.d/02-pihole-dhcp-persica-cluster.conf
Something like this
dhcp-host=98:90:96:BE:89:52,set:persica,192.168.1.31,persica1,5m # one dhcp-host line per host dhcp-boot=tag:persica,grub/grubx64.efi,illustrious.thighhighs.top,192.168.1.12
Run pihole restartdns after making changes
- PXE boot for kickstart install, which will hit calico for DHCP, then illustrious for the boot image and kickstart config
- tftpd-hpa is running on illustrious
Upstream repo mirror: https://repo.almalinux.org/almalinux/9/BaseOS/x86_64/os/EFI/BOOT/
Drop that content in /srv/tftp/
root@illustrious:/srv/tftp# tree . ├── BOOTX64.EFI ├── default.efi ├── grub │ ├── grub.cfg │ ├── grub.cfg-01-98-90-96-be-89-52 │ └── grubx64.efi ├── images │ └── Alma-9.1 │ ├── initrd.img │ └── vmlinuz ├── ipxe.efi └── shimx64.efi
Add a grub config fragment for the host's MAC address: grub.cfg-01-xx-xx-xx-xx-xx-xx
- Make sure the grub config has the correct URL for its kickstart config
kickstart file served from /data/www/illustrious/ks: https://illustrious.thighhighs.top/ks/persica1.ks.cfg
- Make sure your per-host config file has the correct name
- KS references:
Generator tool: https://access.redhat.com/labs/kickstartconfig/
- k8s doesn't play well with swap so we need to disable it. Provision a minimal swap volume of 1gb, then disable it later
This was useful for figuring out the TFTP stuff for the first time: https://askubuntu.com/questions/1183487/grub2-efi-boot-via-pxe-load-config-file-automatically
Paths are hardcoded into the grubx64.efi binary, meaning HDD and PXE versions aren't the same. Make sure you put all the grub stuff in a grub/ directory. Check the $prefix to see where it's searching:
UEFI settings
Get to the UEFI
- Probably get stuck in windows for first boot
- Win, then "UEFI", get to advanced startup options
- Boot with Advanced Boot Options
- Troubleshoot, Advanced Options, UEFI Firmware Settings, Restart
Record details
- Get the LOM MAC Address from Settings, General, System Info
Change settings
- General
- Boot Sequence
- Select UEFI boot list
- Advanced Boot Options
- Disable Legacy OPROMs
- UEFI Boot Path Security
- Set to Never
- Date/Time
- Set clock to approx correct for UTC time
- Boot Sequence
- System Configuration
- Integrated NIC
- Enable UEFI Network Stack
- Enabled w/ PXE
- SATA Operation
- AHCI
- SMART Reporting
- Disabled, we don't need it
- Audio
- Disable all audio, we don't need it
- Integrated NIC
- Security
- TPM Security
- Check everything except Clear
- Activated
- CPU XD support
- Enabled
- TPM Security
- Secure Boot
- Secure Boot Enable
- Disabled
- Secure Boot Enable
- Performance
- Multi-core support: All
- Speedstep: Enabled
- C-states: Enabled
- Limit CPUID: Disabled
- Turboboost: Enabled
- Power Management
- AC Recovery: Power On
- Deep Sleep Control: Disabled
- USB Wake Support: Enable USB wake from Standby
- Wake on LAN/WLAN: LAN with PXE Boot
- Block Sleep: Enable blocking of sleep
- POST Behaviour
- Keyboard Errors: Disable error detection
- Virtualisation support
- Enable VT
- Enable VT-d
- Enable Trusted Execution
Reboot and go back in again.
- Boot only from IPv4 with NIC (PXE boot)
Ansible management after kickstart build
This is getting everything to the state where I can bootstrap the cluster. I should ansible'ise everything, making minimal assumptions about the kickstart part of the process.
I'm keeping a simple ansible repo in ~/git/persica-ansible/
I have a basic set of roles to get the nodes into a workable state, right before I invoke kubeadm for the first time.
--- - name: Configure persica k8s cluster hosts: persica roles: - role: common tags: common - role: docker_for_kube tags: docker_for_kube - role: kube_daemons tags: kube_daemons
Initialise the control plane
This is manual of course, no ansible here.
This will be a single-node control plane, but we should specify --control-plane-endpoint anyway. persica1 is going to be our control plane.
Our Pod network add-on will be Flannel. We can specify --pod-network-cidr but I'll try without first.
- It'll detect containerd
The default --apiserver-advertise-address will be fine, let it autodetect
I added a custom CNAME record to local pihole (calico) and Gandi (public service), for persica-endpoint => persica1. Unlike the DHCP stuff, this is in the general DNS web interface, not a custom config file.
After a bunch of faffing around to fix up the firewall config, bridge filtering kernel module, and enabling ipv4 forwarding, the init begins after passing preflight checks.
[root@persica1 ~]# kubeadm init --control-plane-endpoint=persica-endpoint [init] Using Kubernetes version: v1.27.1 [preflight] Running pre-flight checks [WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly [preflight] Pulling images required for setting up a Kubernetes cluster [preflight] This might take a minute or two, depending on the speed of your internet connection [preflight] You can also perform this action in beforehand using 'kubeadm config images pull' W0415 03:43:19.958609 39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0) W0415 03:43:52.646765 39430 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.6" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.k8s.io/pause:3.9" as the CRI sandbox image. [certs] Using certificateDir folder "/etc/kubernetes/pki" [certs] Generating "ca" certificate and key [certs] Generating "apiserver" certificate and key [certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local persica-endpoint persica1] and IPs [10.96.0.1 192.168.1.31] [certs] Generating "apiserver-kubelet-client" certificate and key [certs] Generating "front-proxy-ca" certificate and key [certs] Generating "front-proxy-client" certificate and key [certs] Generating "etcd/ca" certificate and key [certs] Generating "etcd/server" certificate and key [certs] etcd/server serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1] [certs] Generating "etcd/peer" certificate and key [certs] etcd/peer serving cert is signed for DNS names [localhost persica1] and IPs [192.168.1.31 127.0.0.1 ::1] [certs] Generating "etcd/healthcheck-client" certificate and key [certs] Generating "apiserver-etcd-client" certificate and key [certs] Generating "sa" key and public key [kubeconfig] Using kubeconfig folder "/etc/kubernetes" [kubeconfig] Writing "admin.conf" kubeconfig file [kubeconfig] Writing "kubelet.conf" kubeconfig file [kubeconfig] Writing "controller-manager.conf" kubeconfig file [kubeconfig] Writing "scheduler.conf" kubeconfig file [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Starting the kubelet [control-plane] Using manifest folder "/etc/kubernetes/manifests" [control-plane] Creating static Pod manifest for "kube-apiserver" [control-plane] Creating static Pod manifest for "kube-controller-manager" [control-plane] Creating static Pod manifest for "kube-scheduler" [etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" W0415 03:44:21.781505 39430 images.go:80] could not find officially supported version of etcd for Kubernetes v1.27.1, falling back to the nearest etcd version (3.5.7-0) [wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s [kubelet-check] Initial timeout of 40s passed. Unfortunately, an error has occurred: timed out waiting for the condition This error is likely caused by: - The kubelet is not running - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled) If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands: - 'systemctl status kubelet' - 'journalctl -xeu kubelet' Additionally, a control plane component may have crashed or exited when started by the container runtime. To troubleshoot, list all containers using your preferred container runtimes CLI. Here is one example how you may list all running Kubernetes containers by using crictl: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause' Once you have found the failing container, you can inspect its logs with: - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID' error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster To see the stack trace of this error execute with --v=5 or higher
No worky :/
Maybe I need the control plane on a separate node after all. I'll try illustrious.
- copy containerd/config.toml to illustrious
- apt install -y apt-transport-https ca-certificates curl
curl -fsSLo /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
prep repo defn
cat <<EOF > /etc/apt/sources.list.d/kubernetes.sources X-Repolib-Name: Kubernetes Enabled: yes Types: deb URIs: https://apt.kubernetes.io/ Suites: kubernetes-xenial Architectures: amd64 Components: main Signed-By: /etc/apt/trusted.gpg.d/kubernetes-archive-keyring.gpg X-Repolib-ID: Kubernetes EOF
- apt update
- apt install -y kubelet kubeadm kubectl
- apt-mark hold kubelet kubeadm kubectl
Now try kubeadm again.
Oh sonovabitch! Config not well described: https://github.com/containerd/containerd/issues/6964
Fixed config /etc/containerd/config.toml:
version = 2 disabled_plugins = [] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes] [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc] base_runtime_spec = "" cni_conf_dir = "" cni_max_conf_num = 0 container_annotations = [] pod_annotations = [] privileged_without_host_devices = false runtime_engine = "" runtime_path = "" runtime_root = "" runtime_type = "io.containerd.runc.v2" [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options] BinaryName = "" CriuImagePath = "" CriuPath = "" CriuWorkPath = "" IoGid = 0 IoUid = 0 NoNewKeyring = false NoPivotRoot = false Root = "" ShimCgroup = "" SystemdCgroup = true # They suggest pinning this image, so we'll do that. This is the out-of-box default. # https://kubernetes.io/docs/setup/production-environment/container-runtimes/#override-pause-image-containerd [plugins."io.containerd.grpc.v1.cri"] sandbox_image = "registry.k8s.io/pause:3.9"
We could/should be using kubeadm init with a configuration file: https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/
Apr 15 04:48:26 illustrious.thighhighs.top systemd[1]: Started kubelet: The Kubernetes Node Agent. Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --container-runtime-endpoint has been deprecated, This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information. Apr 15 04:48:26 illustrious.thighhighs.top kubelet[12354]: Flag --pod-infra-container-image has been deprecated, will be removed in a future release. Image garbage collector will get sandbox image information from CRI.
But screw that. Because guess what, it's also poorly documented!
Initialising the control plane now actually works
kubeadm init --control-plane-endpoint=persica-endpoint Setup my `~/.kube/` config stuff as directed. Apparently this is an uber-superuser, so I shouldn't be using it regularly. Oh. cat <<EOF > kubeconfig_example.yml apiVersion: kubeadm.k8s.io/v1beta3 kind: ClusterConfiguration # Will be used as the target "cluster" in the kubeconfig clusterName: "persica" # Will be used as the "server" (IP or DNS name) of this cluster in the kubeconfig controlPlaneEndpoint: "persica-endpoint.thighhighs.top:6443" # The cluster CA key and certificate will be loaded from this local directory certificatesDir: "/etc/kubernetes/pki" EOF # on illustrious kubeadm kubeconfig user --config kubeconfig_example.yml --client-name furinkan --validity-period 8760h
Now try adding a pod network. We'll use Flannel, and find the docs ourselves: https://github.com/flannel-io/flannel#deploying-flannel-manually
# from suomi kubectl --context=persica-admin apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml kubectl --context=persica-admin get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-zr6fb 0/1 CrashLoopBackOff 1 (16s ago) 34s kube-system coredns-5d78c9869d-mp7p9 0/1 ContainerCreating 0 66m kube-system coredns-5d78c9869d-tlsc6 0/1 ContainerCreating 0 66m kube-system etcd-illustrious.thighhighs.top 1/1 Running 1 66m kube-system kube-apiserver-illustrious.thighhighs.top 1/1 Running 1 66m kube-system kube-controller-manager-illustrious.thighhighs.top 1/1 Running 1 66m kube-system kube-proxy-5mntm 1/1 Running 0 66m kube-system kube-scheduler-illustrious.thighhighs.top 1/1 Running 1 66m
Doesn't work because we don't have the same podCIDR, and the default isn't compatible with whatever kubeadm does? FFS!
https://devops.stackexchange.com/questions/5898/how-to-get-kubernetes-pod-network-cidr
Okay so I can either nuke the cluster and reinstantiate it with podCIDR, or just reinstall the network plugin or something. Let's try the latter.
get the current podCIDR: https://devops.stackexchange.com/a/14867
- kubeadm config print init-defaults | grep serviceSubnet
wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
- Edit it
- Reapply it? kubectl apply -f kube-flannel.yml
- Is it still crashlooping? kubectl get pods --all-namespaces
Yeah.
Fukkit try again
# on illustrious kubeadm reset rm -rf /etc/cni/net.d/ rm -rf ~/.kube/ # fix the init: https://github.com/flannel-io/flannel/issues/728#issuecomment-308878912 kubeadm init --control-plane-endpoint=persica-endpoint.thighhighs.top --pod-network-cidr=10.244.0.0/16 # Fix up my kubectl creds again # install flannel again kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml # is it working now? kubectl get pods --all-namespaces # IT FUCKING WORKS!!
Now we join some worker nodes to the cluster, finally.
# on persica1 kubeadm join persica-endpoint.thighhighs.top:6443 --token FOO.FOOFOOFOO \ --discovery-token-ca-cert-hash sha256:BARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBARBAR [preflight] Running pre-flight checks [preflight] Reading configuration from the cluster... [preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' [kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" [kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" [kubelet-start] Starting the kubelet [kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap... This node has joined the cluster: * Certificate signing request was sent to apiserver and a response was received. * The Kubelet was informed of the new secure connection details. Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
It's joined but apparently NotReady:
root@illustrious:~# kubectl get nodes NAME STATUS ROLES AGE VERSION illustrious.thighhighs.top NotReady control-plane 17m v1.27.1 persica1 NotReady <none> 2m7s v1.27.0
Apparently coredns won't start because of taints, as described here:
https://serverfault.com/questions/1064936/coredns-pods-stuck-in-pending-state
- No explanation as to why the taints aren't going away
- Similar problem here
- Someone says to just restart containerd
Fuck yoooooouuu, now the coredns containers are running. I probably shouldn't have jumped the gun and joined all the worker nodes... I need to kick them so they start properly.
root@illustrious:~# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-4p4wd 0/1 Init:0/2 0 21m kube-flannel kube-flannel-ds-6qfrm 0/1 Init:0/2 0 12m kube-flannel kube-flannel-ds-kb94w 0/1 Init:0/2 0 12m kube-flannel kube-flannel-ds-vctrt 1/1 Running 0 30m kube-system coredns-5d78c9869d-dqnkh 1/1 Running 0 36m kube-system coredns-5d78c9869d-rbmhm 1/1 Running 0 36m kube-system etcd-illustrious.thighhighs.top 1/1 Running 2 36m kube-system kube-apiserver-illustrious.thighhighs.top 1/1 Running 2 36m kube-system kube-controller-manager-illustrious.thighhighs.top 1/1 Running 0 36m kube-system kube-proxy-8dl56 0/1 ContainerCreating 0 12m kube-system kube-proxy-dppxt 0/1 ContainerCreating 0 21m kube-system kube-proxy-ljk6c 1/1 Running 0 36m kube-system kube-proxy-t7gcn 0/1 ContainerCreating 0 12m kube-system kube-scheduler-illustrious.thighhighs.top 1/1 Running 2 36m
Try deleting and re-adding a node. From https://stackoverflow.com/a/54220808/806927
# on illustrious kubectl get nodes kubectl drain persica1 kubectl drain persica1 --ignore-daemonsets --delete-local-data kubectl delete node persica1 # on persica1 kubeadm reset then join again
Looks like the kube-proxy is having trouble starting on persica1. And while it's only a warning, I bet it's more significant than that.
root@illustrious:~# kubectl get pods --all-namespaces NAMESPACE NAME READY STATUS RESTARTS AGE kube-flannel kube-flannel-ds-gjq5h 0/1 Init:0/2 0 3m33s kube-flannel kube-flannel-ds-vctrt 1/1 Running 0 41m kube-system coredns-5d78c9869d-dqnkh 1/1 Running 0 47m kube-system coredns-5d78c9869d-rbmhm 1/1 Running 0 47m kube-system etcd-illustrious.thighhighs.top 1/1 Running 2 47m kube-system kube-apiserver-illustrious.thighhighs.top 1/1 Running 2 47m kube-system kube-controller-manager-illustrious.thighhighs.top 1/1 Running 0 47m kube-system kube-proxy-ljk6c 1/1 Running 0 47m kube-system kube-proxy-xpv58 0/1 ContainerCreating 0 3m33s kube-system kube-scheduler-illustrious.thighhighs.top 1/1 Running 2 47m root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-xpv58 4m29s Normal Scheduled pod/kube-proxy-xpv58 Successfully assigned kube-system/kube-proxy-xpv58 to persica1 9s Warning FailedCreatePodSandBox pod/kube-proxy-xpv58 Failed to create pod sandbox: open /run/systemd/resolve/resolv.conf: no such file or directory # on persica1 mkdir /run/systemd/resolve ln -s /etc/resolv.conf /run/systemd/resolve/resolv.conf wtf now there's another error: root@illustrious:~# kubectl get events --namespace=kube-system | grep pod/kube-proxy-grqhf 20s Normal Scheduled pod/kube-proxy-grqhf Successfully assigned kube-system/kube-proxy-grqhf to persica1 6s Warning FailedCreatePodSandBox pod/kube-proxy-grqhf Failed to create pod sandbox: rpc error: code = InvalidArgument desc = failed to create containerd container: create container failed validation: container.Runtime.Name must be set: invalid argument
I think I haven't deployed a good containerd config everywhere yet. Deployed that, and suddenly the damn kube-proxy and kube-flannel containers are working.
Now I can add the other two nodes, still need to fix the resolv.conf manually.
root@illustrious:~# kubectl get nodes -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME illustrious.thighhighs.top Ready control-plane 78m v1.27.1 192.168.1.12 <none> Ubuntu 22.04.2 LTS 5.15.0-69-generic containerd://1.6.20 persica1 Ready <none> 21m v1.27.0 192.168.1.31 <none> AlmaLinux 9.1 (Lime Lynx) 5.14.0-162.6.1.el9_1.x86_64 containerd://1.6.20 persica2 Ready <none> 2m41s v1.27.0 192.168.1.32 <none> AlmaLinux 9.1 (Lime Lynx) 5.14.0-162.6.1.el9_1.x86_64 containerd://1.6.20 persica3 Ready <none> 33s v1.27.0 192.168.1.33 <none> AlmaLinux 9.1 (Lime Lynx) 5.14.0-162.6.1.el9_1.x86_64 containerd://1.6.20
Good enough for now!
Making ingress work
I don't understand this well enough, but I want to use ingress-nginx. Here's a page about it, albeit not using raw kubectl: https://kubernetes.github.io/ingress-nginx/kubectl-plugin/
Maybe this one too: https://medium.com/tektutor/using-nginx-ingress-controller-in-kubernetes-bare-metal-setup-890eb4e7772
Making load balancing work
I thought I wouldn't need it, but it looks like I do, if I want sensible useful functionality. Here's an explanation of why I want to use Metal LB, and it's not just for BGP-based configs: https://github.com/kubernetes/ingress-nginx/blob/main/docs/deploy/baremetal.md
I'll use it in L2 mode with ARP/NDP I think. Just need to dedicate a bunch of IPs to it so it can manage the traffic to them.