Almost private-only Proxmox cluster

If you wonder if it is possible to have private-only Proxmox cluster in Hetnzer, then the answer is yes, almost. Of couse you can order dedicated hardware to hide your boxes from public eyes, but if you are not going that way, then you can try other way.

  • Install the first Proxmox with public IP, VLAN na pfSense, as usual
  • Second step is to install another Proxmox also with public IP, set up everything you need and leave it only with VLAN deleting public IP configuration. In /etc/hosts you need to set VLAN address i reload network interface. After this you need to go to first box as number 2 is no longer available
  • From the first box create cluster and join the second box (from sandbox VM with VLAN)
  • In order to have internet connectivity from private-only boxes you need to setup its VLAN routing to the first box VLAN address

Configuration of the first box:

auto DEV
iface DEV inet manual

auto vmbr0
iface vmbr0 inet static
        address PUBLIC-IP/26
        gateway PUBLIC-GW
        bridge-ports DEV
        bridge-stp off
        bridge-fd 0
        pointopoint PUBLIC-GW
        up route add -net PUBLIC-NET netmask MASK gw PUBLIC-GW dev vmbr0
        up ip route add 2PUBLIC-IP/32 dev vmbr0
#PUBLIC

iface DEV.4xxx inet manual
auto vmbr4xxx
iface vmbr4xxx inet manual
        address 10.x.x.x/16
        bridge-ports DEV.4xxx
        bridge-stp off
        bridge-fd 0
        mtu 1400
#VLAN

Configuration of the second, private-only, box:

iface DEV.4xxx inet manual

iface DEV inet manual

auto vmbr4xxx
iface vmbr4xxx inet static
        address 10.x.x.x/16
        gateway PFSENSE-AT-1ST-BOX
        bridge-ports DEV.4xxx
        bridge-stp off
        bridge-fd 0
        mtu 1400
#VLAN

oc rsync takes down OKD master processes

It might sound a little weird, but that’s the case. I was trying to setup NFS mount in OKD docker registry (from this tutorial). During oc rsync from inside docker-registry container I found that OKD master processes are down because of heath check thinking that there is some connectivity problem. This arised because oc rsync does not have rate limiting feature and it I fully utilized local network then there is no bandwidth left for the cluster itself.

Few things taken out from logs (/var/log/messages):

19,270,533,120  70%   57.87MB/s    0:02:19  The connection to the server okd-master:8443 was refused - did you specify the right host or port?
Liveness probe for "okd-master.local_kube-system (xxx):etcd" failed (failure): member xxx is unhealthy: hot unhealthy result
okd-master origin-node: cluster is unhealthy

The starting transfer from docker-registry container is at the of 200MB/s. I’m not quite sure if network is actually capable of such speed. The problem is repeatable, after liveness probe is triggered, master, etcd and webconsole are restarted which could lead to unstable cluster. We should avoid it if possible. Unfortunately docker-registry container is a very basic one, without ip, ifconfig, ssh, scp or any utilities which could help with transfering out files. But…

  • you can check IP of the container in webconsole
  • you can start HTTP server python -m SimpleHTTPServer on port 8000
  • you can then download the file with wget x.x.x.x:8000/file.tar --limit-rate=20000k

It is really funny, that the container lacks basic tools, but got Python. Set rate in wget on reasonable level that the internal network will not be fully utilized. To sum up. I did not encounter such problem on any other environment, either bare-metal or virtualized so it might be related specially with Microsoft Azure SDN and how it behaves on such traffic load.