May 2023 – MICHAŁ SOBCZAK

auto DEV iface DEV inet manual auto vmbr0 iface vmbr0 inet static address PUBLIC-IP/26 gateway PUBLIC-GW bridge-ports DEV bridge-stp off bridge-fd 0 pointopoint PUBLIC-GW up route add -net PUBLIC-NET netmask MASK gw PUBLIC-GW dev vmbr0 up ip route add 2PUBLIC-IP/32 dev vmbr0 #PUBLIC iface DEV.4xxx inet manual auto vmbr4xxx iface vmbr4xxx inet manual address 10.x.x.x/16 bridge-ports DEV.4xxx bridge-stp off bridge-fd 0 mtu 1400 #VLAN

iface DEV.4xxx inet manual iface DEV inet manual auto vmbr4xxx iface vmbr4xxx inet static address 10.x.x.x/16 gateway PFSENSE-AT-1ST-BOX bridge-ports DEV.4xxx bridge-stp off bridge-fd 0 mtu 1400 #VLAN

It might sound a little weird, but that’s the case. I was trying to setup NFS mount in OKD docker registry (from this tutorial). During oc rsync from inside docker-registry container I found that OKD master processes are down because of heath check thinking that there is some connectivity problem. This arised because oc rsync does not have rate limiting feature and it I fully utilized local network then there is no bandwidth left for the cluster itself.

Few things taken out from logs (/var/log/messages):

19,270,533,120  70%   57.87MB/s    0:02:19  The connection to the server okd-master:8443 was refused - did you specify the right host or port?

Liveness probe for "okd-master.local_kube-system (xxx):etcd" failed (failure): member xxx is unhealthy: hot unhealthy result

okd-master origin-node: cluster is unhealthy

The starting transfer from docker-registry container is at the of 200MB/s. I’m not quite sure if network is actually capable of such speed. The problem is repeatable, after liveness probe is triggered, master, etcd and webconsole are restarted which could lead to unstable cluster. We should avoid it if possible. Unfortunately docker-registry container is a very basic one, without ip, ifconfig, ssh, scp or any utilities which could help with transfering out files. But…

you can check IP of the container in webconsole
you can start HTTP server python -m SimpleHTTPServer on port 8000
you can then download the file with wget x.x.x.x:8000/file.tar --limit-rate=20000k

It is really funny, that the container lacks basic tools, but got Python. Set rate in wget on reasonable level that the internal network will not be fully utilized. To sum up. I did not encounter such problem on any other environment, either bare-metal or virtualized so it might be related specially with Microsoft Azure SDN and how it behaves on such traffic load.

Month: May 2023

Almost private-only Proxmox cluster

oc rsync takes down OKD master processes