GPU pass-thru in Proxmox 7 and Ubuntu 20, follow-up

In previous article about GPU pass-thru which can found here, I described how to setup things mostly from Proxmox perspective. However from VM perspective I would like to make a little follow-up, just to make things clear about it.

It has been told that you need to setup q35 machine with VirtIO-GPU and UEFI. It is true, but the most important thing is to actuall disable secure boot, which effectively prevents from loading NVIDIA driver modules.

Add EFI disk, but do not check “pre-enroll keys”. This option would enroll keys and enable secure boot by default. Just add EFI disk without selecting this option and after starting VM you should be able to see your GPU in nvidia-smi.

That is all.

Configuring NVIDIA RTX A6000 ADA in Ubuntu 22

I thought that installing NVIDIA RTX A6000 ADA in default Ubuntu 22 server installation would be an easy one. However, installing drivers from the repository made no good. I verified if secure boot is enable and no it was disabled.

We need to install few things first:

sudo apt-get install linux-headers-$(uname -r)
sudo apt install build-essential
sudo apt install mokutil

We need to get rig of previously installed drivers:

sudo apt remove --purge '^nvidia-.*'
sudo apt autoremove
sudo apt clean

Verify if secure boot is disabled:

mokutil --sb-state

Get NVIDIA driver, such as NVIDIA-Linux-x86_64-535.216.01.run from their webiste and install it:

chmod +x NVIDIA-Linux-x86_64-535.216.01.run
./NVIDIA-Linux-x86_64-535.216.01.run

In case you got rid of previously installed drivers, disabled secure boot and installed build tools, kernel headers… you will be good to go to compile driver module. In my case this was the only was to nvidia-smi to recognize this GPU in Ubuntu 22.

Deploy 20 x Docker Swarm nodes using Terraform and Ansible

If you wonder how to automatically deploy 20 nodes of Docker Swarm and run 100 Docker containers in it, then continue reading. I will show how to achieve this by using Terraform, Ansible and Portainer.

Course of action

  • Terraform 20 x Ubuntu virtual machines
  • Install Docker Swarm using Ansible
  • Install Portainer
  • Deploy 100 containers across Swarm cluster

What is Docker Swarm and why I need to have 20 of these?

Docker is containers toolkit utilizing cgroups, namespaces which allows to control and share resources of the CPU and operating system. Docker Swarm its a special kind of runtime mode, which allows to run multiple clustered nodes which can be separate physical computers or virtual machines. It gives us scalability and resource separation yet keeping it all within same management utitlities. You can make work much easier by installing Portainer CE, which is management UI for containers orchestration (de facto operations management).

So back to the question, why 20 of these? You can have single Docker Swarm node being both manager and worker and put loads of resources that you have, like CPU and RAM. But for sake of better maintanane, equal resources utilization and securing resources, you use clustered mode such as Docker Swarm with more than one node.

What is Terraform and Ansible and the whole automation thing?

Terraform is automation tool for automating construction of systems, for instance provisioning virtual machines. You can do it also with Ansible, but its role here is more like to manage already provisioned systems instead of provisioning themself. So both tool could be used possible for all the tasks, however Terraform which Telmate Proxmox plugin do it the easiest way. I use Ansible to automate tasks across resources created with Terraform. This is my way, your might be different.

Why to deploy 100 containers of the same application?

If your application perfectly handles errors and tools outages and it’s capable of running multiple processes with multiple threads and you know that will never be redeployed, then stick with 1 container. But in any of these cases, having muliple containers, instances, of the same application will be beneficial. You increate your fault-tolerance, make software releases easier. You need not bother that much about application server configuration, etc, because all of these is mitigated by deploying in N > 1 instances.

You can have 2 containers for frontend application, 2 containers for backend application, 1 application for background processing and many other composites of your single or multiple repositories. You could have 50 frontends and 50 backends, it depends on a case. You could introduce auto-scaling, which by the way is present in OKD, OpenShift, Kubernetes, but Docker Swarm and Portainer lack of such feature. It is unfortunate, but still you can do it yourself or plan and monitor your resources usage. In case of dedicated hardware it is not so important to have autoscaling, just overallocate for future peaks. In case of public cloud providers, when you pay for what you use it will be important to develop auto-scaling feature.

Terraform 20 Ubuntu VM

So in order to deploy more than one VM using Terraform and Telmate Proxmox provider plugin you need to either copy resource section multiple times or use count notation. I defined terraform and provider sections as well as two resource sections each for different target Proxmox server. By using count, you get ability to interpolate ${count.index} for each consecutive execution of resource. I used it for name and network IP address. Target server is differentiated using target_node. Be sure to use appropriate clone name and disk storage name at which your VMs will be placed.

terraform {
    required_providers {
        proxmox = {
            source  = "telmate/proxmox"
            version = "2.9.0"
        }
    }
}

provider "proxmox" {
    pm_api_url      = "https://192.168.2.11:8006/api2/json"
    pm_user         = "root@pam"
    pm_password     = "xxx"
    pm_tls_insecure = true
    pm_timeout       = 300
    pm_parallel      = 2
}

resource "proxmox_vm_qemu" "ubuntu_vm_xx" {
    count       = 10
    name        = "z10-ubuntu-22-from-terraform-2-${count.index}"
    target_node = "lab2" 
    clone       = "new-template"
    full_clone  = true
    memory      = 4000
    cores       = 2 
    bootdisk = "virtio0"
    network {
        bridge = "vmbr0"
        model = "virtio"
    }
    disk {
        storage = "single-dir-ssd-256GB"
        size = "10G"
        type = "virtio"
    }
    os_type = "cloud-init"
    ipconfig0 = "ip=192.168.2.${count.index}/22,gw=192.168.1.1"
    ciuser = "xxx"
    cipassword = "xxx"
}

resource "proxmox_vm_qemu" "ubuntu_vm_lab" {
    count       = 10
    name        = "z10-ubuntu-22-from-terraform-3-${count.index}"
    target_node = "lab" 
    clone       = "z10-ubuntu-22-template-RAW"
    full_clone  = true
    memory      = 4000
    cores       = 2 
    bootdisk = "virtio0"
    network {
        bridge = "vmbr0"
        model = "virtio"
    }
    disk {
        storage = "vms1"
        size = "10G"
        type = "virtio"
    }
    os_type = "cloud-init"
    ipconfig0 = "ip=192.168.3.${count.index}/22,gw=192.168.1.1"
    ciuser = "xxx"
    cipassword = "xxx"
}

With above notation you will create 10+10 Ubuntu VM. You can run it with:

terraform apply -parallelism=1

After VM are created you need to wait until cloud-init finshes its job. If you are not sure if its running or then, then reboot this VM so you will not get any stuck processes which could collide with the next step which is installing Docker with Ansible.

Install Docker service with Ansible

First we define the inventory:

[lab]
192.168.2.0
192.168.2.1
192.168.2.2
192.168.2.3
192.168.2.4
192.168.2.5
192.168.2.6
192.168.2.7
192.168.2.8
192.168.2.9
192.168.3.0
192.168.3.1
192.168.3.2
192.168.3.3
192.168.3.4
192.168.3.5
192.168.3.6
192.168.3.7
192.168.3.8
192.168.3.9

[lab:vars]
ansible_user=xxx
ansible_password=xxx

Second of all, we define and run installation of required packages:

---
- name: Docker reqs installation
  hosts: all
  vars:
    ansible_ssh_common_args: '-o ServerAliveInterval=60'

  tasks:
    - name: Install aptitude
      apt:
        name: aptitude
        state: latest
        update_cache: true

    - name: Install required system packages
      apt:
        pkg:
          - apt-transport-https
          - ca-certificates
          - curl
          - software-properties-common
          - python3-pip
          - virtualenv
          - python3-setuptools
        state: latest
        update_cache: true

And finally insallation of Docker itself:

---
- name: Docker installation
  hosts: all
  vars:
    ansible_ssh_common_args: '-o ServerAliveInterval=60'

  tasks:
    - name: Add Docker GPG apt Key
      apt_key:
        url: https://download.docker.com/linux/ubuntu/gpg
        state: present

    - name: Add Docker Repository
      apt_repository:
        repo: deb https://download.docker.com/linux/ubuntu jammy stable
        state: present

    - name: Update apt and install docker-ce
      apt:
        name: docker-ce
        state: latest
        update_cache: true

    - name: Install Docker Module for Python
      pip:
        name: docker

We could include those steps into Packer configuration and that way Docker with its requirements would be included by default. However it is good to know not only Packer and Terraform, but also how to run it from Ansible.

Configure Docker Swarm and join cluster

I decided to configure single manager with Portainer, so I picked 192.168.2.0 for this job:

sudo docker swarm init
curl -L https://downloads.portainer.io/ce2-21/portainer-agent-stack.yml -o portainer-agent-stack.yml
sudo docker stack deploy -c portainer-agent-stack.yml portainer

Now we have Docker Swarm initialized, installed Portainer stack. After initializing Swarm mode you get a token for nodes inclusion. You can join more manager nodes, but for simple installation demo you can stick with single one and join additional 19 worker nodes, by using Ansible command:

ansible all -i inventory -m shell -a "sudo docker swarm join --token SWMTKN-1-4ph10h6ck7aqonnr7bunok3be63nl7itnjkua4owm0cvmyh2z1-754wirc3y36igf86porgwtbjh 192.168.2.0:2377" --become -o -f 1

In portainer there is cluster visualizer, where you see all nodes and what is inside of them:

Running containers workloads

Using Portainer CE you can scale service instances, ie containers by just entering number of Docker container ro run. You can either run replicated mode, where you explicitly define how many container you would like to start, or you can use global mode, where number of containers will automatically equal number of nodes in your Docker Swarm cluster.

Docker Swarm scheduler will try to place containers equally according to service definiotion and hardware capabilities. You can try gradually increase number of instances and monitor hardware resources usage. There is whole separate topic regarding deployment webhooks, deployment strategies etc.

Performance comparison scenario

Installation is initialized by terraform init -parallel=1 command. On older hardware I suggest go for one-by-one strategy, instead of high parallel leve which could lead to some unexpected behavior like disk clone timeout or other issues.

With that step done we can see how those two Terraform resource sections transformed into 5+5 virtual machines on two Proxmox nodes. Terraform keeps track of the baseline/state of your deloyment, however, it is not 100% safe to rely on it only. It is good to double check the results. In my tests I experienced situation when Terraform said that it destroyed all the content, but it did not actually. Same with resources creation, even if you are told here that everything is done be sure check it out. The problem may lay within at least 4 places, which are Terraform itself, Telmate Proxmox provider, Proxmox golang APIa and finally Proxmox itself with its hardware and software.

Both with apply and destroy you will be shown the proposed changes based on your configuration, you can then review what is going to happen and if this fits your needs:

It is crucial to know on what hardware you are working on. At least from performance perspective. In case of Proxmox VE and bare-metal hardware what you see is what you get. But this strategy can be also applied on many other platform providers, yo can what brings the specific AWS or Azure virtual machines. So to illustrate it, I compared Server load between 2 Proxmox nodes within the same cluster. Here is one with 2 x Intel Xeon X5560:

And here you have 2 x Intel Xeon E5645:

You can see exactly that difference in terms on CPU theoretical performance is confirmed in real scenario. The first server gets load up to 14 and the second one up to 16. There is also difference in terms of RAM usage. Same goes with drives performance. All those factors can be important if running plenty of Ansible or Terraform tasks concurrently.

Conclusion

With Terraform, Ansible and even Packer, you easily deploy multiple virtual resources and scale your applications deployment.

Use Packer & Terraform to generate Ubuntu 22.04-4 server image and deploy it automatically to Proxmox

If you wonder how to automate Ubuntu virtual machine creation and then deploy it to Proxmox in multple copies, then you are looking for Packer and Terraform.

Side note: going for virtual machines in Proxmox is the proper way. I tried for several days to have LXC containers working, however finally I will say that it is not the best option with lot of things going bad like cgroups, AppArmor, nesting, FUSE, ingress networking etc. There is literally too much to handle with LXC and with VM there is no such problem, so discussion end here in favour of Proxmox Qemu. Keep LXC contrainers for simple things.

Why to automate?

Because we can.

Because it is a better way of using our time.

Because it scales better.

Because it provides some form of self-documentation.

Why to use Proxmox and Ubuntu VM?

Ubuntu is a leading Linux distribution without licensing issues with 34% of Linux market share. It has strong user base. It is my personal preference also. It gives us ability to subscribe to Ubuntu Pro which comes with several compliance utilities.

And Proxmox/Qemu started being an enterprise class virtualization software package few years back and it is also a open source leading solutions in its field. In contains clustering features (including failover) as well as support for various storage types. Depending on a source it has around 1% of virtualization software market share.

Installation of Packer and Terraform

It is important to have both Packer and Terraform at its proper versions coming from official repositories. Moreover it is important that the exact way of building specific version of operating system differs from vesion to version, that is way the title of this article says 22.04-4 and not 22.04-3, because there might be some differences.

Install valid version of Packer. The version which come from Ubuntu packages it invalid and it does not contain ability to manage plugins, so be sure to install Packer with official repository.

curl -fsSL https://apt.releases.hashicorp.com/gpg | sudo apt-key add -
sudo apt-add-repository "deb [arch=amd64] https://apt.releases.hashicorp.com $(lsb_release -cs) main"
sudo apt-get update && sudo apt-get install packer

Install valid version of Terraform. Having issues with bundled version of Packer I decided to go for official way of installing at first try:

sudo apt-get update && sudo apt-get install -y gnupg software-properties-common

wget -O- https://apt.releases.hashicorp.com/gpg | \
gpg --dearmor | \
sudo tee /usr/share/keyrings/hashicorp-archive-keyring.gpg > /dev/null

gpg --no-default-keyring \
--keyring /usr/share/keyrings/hashicorp-archive-keyring.gpg \
--fingerprint

echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] \
https://apt.releases.hashicorp.com $(lsb_release -cs) main" | \
sudo tee /etc/apt/sources.list.d/hashicorp.list

sudo apt update

sudo apt-get install terraform

Terraform Telmate/Proxmox

Important note regarding Terraform and its plugin for Proxmox. This plugin as well as Proxmox golang API is provided by a single company which Telmate LLC. This plugin has some compability issues and at the moment for Proxmox 7 I recommend using Telmate/proxmox version 2.9.0. The latest version which is 2.9.14 has some difficulties with handling cloud-init which leads to 50% chance of VM that requires manual drives reconfiguration. As for 2024/09/06 there is no stable 3.0.1 release.

If you happen to have the latest one and would like downgrade, then remove .terraform and .terraform.lock.hcl and then initialize once again with the following command:

terraform init

Generate Ubuntu 22.04-4 template for Proxmox with Packer

Starting from few versions back the Ubuntu project changed its way of automating installations. Instead of seeding you now have a autoinstall feature. Packer project structure contains few files, and I will start with ubuntu-22-template/http/user-data containing cloud-config:

#cloud-config
autoinstall:
  version: 1
  locale: en_US
  ssh:
    install-server: true
    allow-pw: true
    disable_root: true
    ssh_quiet_keygen: true
    allow_public_ssh_keys: true
  packages:
    - qemu-guest-agent
    - sudo
  storage:
    layout:
      name: lvm
      sizing-policy: all
      # password: xxx
  user-data:
    package_upgrade: false
    timezone: Europe/Warsaw
    users:
      - name: temporary
        groups: [sudo]
        lock-passwd: false
        sudo: ALL=(ALL) NOPASSWD:ALL
        shell: /bin/bash
        passwd: "here you place SHA512 generated hash of a password"

In order to turn LUKS on, uncomment storage.layout.password field and set desired password. users.passwd can be generated with mkpasswd using SHA-512. Next is ubuntu-22-template/files/99-pve.cfg:

datasource_list: [ConfigDrive, NoCloud]

Credentials get its own file (./credentials.pkr.hcl). You can of course place it directly into your file, however if you SCM those files it will be permament and shared with others, that is why you should separate this file and even do not include it into your commits:

proxmox_api_url = "https://192.168.2.10:8006/api2/json"
proxmox_api_token_id = "root@pam!root-token"
proxmox_api_token_secret = "your Proxmox token"
my_ssh_password = "your new VM SSH password"

Finally, there is ubuntu-22-template/ubuntu-22-raw.pkr.hcl file, where you define variables, source and build. We source ISO image and define Proxmox VE Qemu VM parameters. The most crucial and cryptic things is to provide valid boot_command. http* sections refers to your machine serving files over HTTP, ssh* section on the other hand refers to configuration relate to the remote machine (newly created VM on Proxmox). Our local machine acts as shell commands provider over HTTP which then being passed to remote machine are executed during system installation.

variable "proxmox_api_url" {
    type = string
}
variable "proxmox_api_token_id" {
    type = string
}
variable "proxmox_api_token_secret" {
    type = string
    sensitive = true
}
variable "my_ssh_password" {
    type = string
    sensitive = true
}

source "proxmox-iso" "ubuntu-server-jammy" {
    proxmox_url = "${var.proxmox_api_url}"
    username =    "${var.proxmox_api_token_id}"
    token =       "${var.proxmox_api_token_secret}"
    insecure_skip_tls_verify = true
    node = "lab"
    vm_id = "141"
    vm_name = "z10-ubuntu-22-template-RAW"
    template_description = "Ubuntu Server Raw Encrypted"
    iso_file = "local:iso/ubuntu-22.04.4-live-server-amd64.iso"
    iso_storage_pool = "local"
    unmount_iso = true
    qemu_agent = true
    scsi_controller = "virtio-scsi-single"
    disks {
        disk_size = "10G"
        format = "raw"
        storage_pool = "vms1"
        storage_pool_type = "directory"
        type = "virtio"
    }
    cores = "2"    
    memory = "4096" 
    network_adapters {
        model = "virtio"
        bridge = "vmbr0"
        firewall = "false"
    } 
    cloud_init = true
    cloud_init_storage_pool = "local"
    boot_command = [
        "<esc><wait>",
        "e<wait>",
        "<down><down><down><end>",
        "<bs><bs><bs><bs><wait>",
        "ip=${cidrhost("192.168.2.0/24", 100)}::${cidrhost("192.168.1.0/24", 1)}:${cidrnetmask("192.168.0.0/22")}::::${cidrhost("1.1.1.0/24", 1)}:${cidrhost("9.9.9.0/24", 9)} ",
        "autoinstall ds=nocloud-net\\;s=http://{{ .HTTPIP }}:{{ .HTTPPort }}/ ---<wait>",
        "<f10><wait>"
    ]
    boot = "c"
    boot_wait = "5s"
    http_directory = "ubuntu-22-template/http" 
    http_bind_address = "IP of machine from which you will run Packer"
    http_port_min = 8802
    http_port_max = 8802
    ssh_host = "192.168.2.100" # new VM proposed IP address
    ssh_username = "temporary"
    ssh_password = "${var.my_ssh_password}"
    ssh_timeout = "20m"
}


build {
    name = "ubuntu-server-jammy"
    sources = ["proxmox-iso.ubuntu-server-jammy"]
    provisioner "shell" {
        inline = [
            "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do echo 'Waiting for cloud-init...'; sleep 1; done",
            "sudo rm /etc/ssh/ssh_host_*",
            "sudo truncate -s 0 /etc/machine-id",
            "sudo apt -y autoremove --purge",
            "sudo apt -y clean",
            "sudo apt -y autoclean",
            "sudo cloud-init clean",
            "sudo rm -f /etc/cloud/cloud.cfg.d/subiquity-disable-cloudinit-networking.cfg",
            "sudo rm -f /etc/netplan/00-installer-config.yaml",
            "sudo sync"
        ]
    }
    provisioner "file" {
        source = "ubuntu-22-template/files/99-pve.cfg"
        destination = "/tmp/99-pve.cfg"
    }
    provisioner "shell" {
        inline = [ "sudo cp /tmp/99-pve.cfg /etc/cloud/cloud.cfg.d/99-pve.cfg" ]
    }
}

To run it:

packer build -var-file=credentials.pkr.hcl ubuntu-22-template/ubuntu-22-raw.pkr.hcl

The installation process is automated and you do not see usual configuration screens. Instead we provide autoinstall configuration and leave few options to be setup later during cloud-init, which is user details and network configuration details. This way we can achieve automation of deployments of such system, which will be show in a moment in Terraform section of this article.

A full overview of project structure is as follows:

├── credentials.pkr.hcl
├── main.tf
├── packer_cache
│   └── port
├── terraform.tfstate
├── terraform.tfstate.backup
└── ubuntu-22-template
    ├── files
    │   └── 99-pve.cfg
    ├── http
    │   ├── meta-data
    │   └── user-data
    └── ubuntu-22-raw.pkr.hcl

5 directories, 8 files

After successful Ubuntu installation system will reboot and convert itself into template so it can be later used as a base for further systems, either as linked clone or full clone. If you prefer having great elasticity then opt for full clone, because you will not have any constraints and limitations concerning VM usage and migration.

Deploy multiple Ubuntu VMs with Terraform

To use Proxmox VM template and create new VM upon it you can do it manually from Proxmox UI. However in case of creating 100 VMs it could take a while. So there is this Terraform utility, which with help of some plugins is able to connect to Proxmox and automate this process for you.

Define Terraform file (.tf) with terraform, provider and resource sections. Terraform section tell Terraform which plugins you are going to use. Provider section tells how to access Proxmox virtualization environment. Finally, resource section where you put all the configuration related to your Ubuntu 22.04-4 backed up with cloud-init. So we start with terraform and required provider plugins. It depends on Proxmox version whever it is 7 or 8 you will be need to give different resource configuration:

terraform {
    required_providers {
        proxmox = {
            source  = "telmate/proxmox"
            version = "2.9.0" # this version has the greatest compatibility
        }
    }
}

Next you place Proxmox provider. It is also possible to define all sensitive data as variables:

provider "proxmox" {
    pm_api_url      = "https://192.168.2.10:8006/api2/json"
    pm_user         = "root@pam"
    pm_password     = "xxx"
    pm_tls_insecure = true
}

First you need to initialize Terraform “backend” and install plugins. You can do this with terraform and provider sections only if you would want. You can also do it after you complete your full spec of tf file.

terraform init

Finally, the resource itself:

resource "proxmox_vm_qemu" "ubuntu_vm" {
    name        = "z10-ubuntu-22-from-terraform-1-20"
    target_node = "lab" 
    clone       = "z10-ubuntu-22-template-RAW"
    memory      = 4000
    cores       = 2 

    network {
        bridge = "vmbr0"
        model = "virtio"
    }

    disk {
        slot = 0
        storage = "vms1"
        size = "10G"
        type = "virtio"
    }
  
    os_type = "cloud-init"
    ipconfig0 = "ip=192.168.2.20/22,gw=192.168.1.1"
    ciuser = "xxx"
    cipassword = "xxx"
}

To run this terraform script you first check it with plan command and execute with apply command:

terraform plan
terraform apply

With that, this mechanism is going to fully clone template as new virtual machine with given cloud-init definitions concering user and network configuration.

I prepared two sample templates, one with LUKS disk encryption and the other one without LUKS encryption. For demo purposes it is enough to use unencrypted drive however for production use it should be your default way of installating operating systems.

Checkpoint: we have created Ubuntu template with Packer and use this template to create new VM using Terraform.

Further reading

  • https://github.com/Telmate/terraform-provider-proxmox/tree/v2.9.0
  • https://registry.terraform.io/providers/Telmate/proxmox/2.9.0/docs/resources/vm_qemu

Video playback not working on LinkedIn in Opera on Ubuntu 22

On fresh installation of Ubuntu 22, using Opera for video playback can be an issue. So even after installing all things that you may think it could help – it does not work. The solution is to install chromium-ffmpeg and copy its libffmpeg.so library into Opera installation folder.

sudo snap install chromium-ffmpeg
cd /snap/chromium-ffmpeg/xx/chromium-ffmpeg-yyyyyy/chromium-ffmpeg
sudo cp libffmpeg.so /usr/lib/x86_64-linux-gnu/opera/libffmpeg.so

Be aware that snap installation path differs in few places so check your installation. After copying ffmpeg library, just restart Opera and the video, previously not loading in LinkedIn, will work.

Private cloud for 50€ (Hetzner, Proxmox, pfSense, HAProxy, Docker Swarm, Portainer, Suricata, PBS)

Create secure, high-performance, affordable environment for your container applications using Hetzner dedicated servers.
For around 50€ per month.

This setup can also be done using different server providers, both dedicated and shared, even on public cloud. This tutorial has not been sponsored by Hetzner or any other software vendors. If you are interested in similar setup, please drop me a message via Linkedin.

Goal

The goal for this setup is to run Docker container in Swarm mode in secure and reliable environment. For sake of security we enable Proxmox firewall, pfSense firewall and Suricata IDS/IPS. For sake of reliability we configure md RAID and create 3 different combined backups targets. To turn this setup into production one, just add two or more Proxmox nodes and two or more Swarm nodes. You will be good to go with your online business.

Hetzner dedicated servers

00:10: Start with visiting Hetzner portal called Robot. It is for managing dedicated servers. We can either rent brand new server with the latest hardware available or go to server auction for a little bit older hardware but for better prices. For this test I will pick server from auction with at least 3 drives inside and 64 GB or memory. Both for test and production setups I suggest picking up enterprise grade CPU like Xeons which tend to better on long term basis and are much most stable than desktop grade CPUs.

Once ordered you will have to wait for your order completion which should take 15 to 30 minutes for most the times, but in case of custom setup you will have to wait even 5 working days, so be sure to order with enough notice period. Your orders will be shown in Server section where you can inspect the server, send remote reboot or order serial console in case of some troubles. Next you give your server a name and add it to virtual switch called vSwitch. It will enable your server for communication with your other servers binded to the same vSwitch ID. Important thing that the MTU must be 1400. For our test setup I also order additional public IP. You need to write usage explanation as IP4 pool is limited. This additional public IP will be used for pfSense router/firewall which will handle all the operations traffic.

Proxmox virtualization

01:52: After the order is completed you receive root user password over email, so be sure to change it at your earliest convenience. I SSH into the server where you can see brief overview of what is inside, what kind of CPU and drives we have and that type of ethernet adapter is installed. System installation can be run using installimage utility. Few things important here, SWRAID, SWRAIDLEVEL, HOSTNAME and partitions setup. Be sure to install only on 2 same size drives and leave other unconfigured. Software mirror, similar to RAID will be handled by md. Be sure not to install system on spinning drives as there will be significant bottleneck in terms of IO available which will be noticeable.

After Debian installation is completed, reboot the server and wait until it is back again. If server is not coming back, then restart it from Robot panel. SSH once again and start installation of Proxmox using my step-by-step tutorial. It covers system update, adding required repositories and installation or Proxmox packages and services. Speed of this process depends on our hardware specification, especially drives on which the system is installed on. It is worth pointing out that Proxmox uses custom Linux kernel and it is suggested to remove other ones.

Having Proxmox installation completed, it is time for disable RPC bind services enabled by default, running on port 111. It is required to have it off from the network if you are in German internet as it is a government requirement. Since we do not need this in test setup we are good to go further with network configuration. If we use only public network and vSwitch then we need to have a bridge, public IP routing and VLAN section. If we would use additional local LAN or VLAN or use separate physical and dedicated switch then there will be need to add additional sections here. Be sure to double or even triple check you configuration, after network reload it have to work. Otherwise you need to opt for remote serial console which takes sometimes even up to an hour. I personally prefer having a consistent naming scheme for server, VLANs and subnets as you may notice. Remember to include 1400 MTU in VLAN section. After networking service restart check local and external connectivity.

As an interesting pick we try to import Proxmox CA into browser certificates store in order to have SSL padlock clean without any security warnings. As for this, we need to have certificate common name set in /etc/hosts to have running. Later on we are going to configure HAProxy over OpenVPN.

The first configuration we will conduct it is a firewall configuration. I put VLAN, LAN and public IP4 addresses in Datacenter-Firewall section which will be applied on all nodes in Proxmox cluster. Of course if we will add additional nodes. Datacenter configuration is then much easier to manage. Firewall will be enable only after going into Firewall-Options and marking it as enabled. Remember to add yourself’s public IP4 address not to cut out the connection.

pfSense firewall/router

11:54: For handling public traffic I install pfSense router/firewall which can be also extended with additional packages providing wide range of features, like VPN, IDS/IPS appliance etc. We start with uploading pfSense 2.7 CE, which will also require us to do an upgrade. Before continuing I review drives condition using SMART interface. And quickly initialize third drive as a directory, need for future PBS installation. I also upload PBS ISO image as well as Ubuntu ISO image.

Create new VM with basic settings. It is important to have at least one network adapter. The first one will be for WAN interface and second one for VLAN where we set MTU 1400. On WAN we set virtual MAC address created earlier in Robot portal. It is critical to have it on WAN if running VM on the server. Missing this setting will cause MAC bleeding warning from Hetzner and even server lock. pfSense installation is straightforward. We pick all default settings. Since we use md for drive mirroring there is no need to configure this redundancy for pfSense. After reboot, pfSense asks for network configuration, both for LAN and WAN interfaces. Need to adjust it. At first, UI is accessible over WAN as we do not have option to go into LAN/VLAN. You could setup sandbox VM from where you could access local network, but for this test setup we will continue with WAN until OpenVPN will be configured. After any major pfSense configuration change it is good practice to reboot it.

Using WAN without Rules setup means that you have to do pfctl -d to disable firewall. Login using default credentials and go thru UI step-by-step configuration. You can change DHCP WAN settings and put explicitly WAN default gateway as it seems not to be set.

pfSense dashboard, which is main page can be enhanced with few fancy graphs concerning service and network statistics. Main configuration can be done in System – General Setup or Advanced settings. We setup domain, timezone and DNS configuration. For later HAProxy you need to disable auto WebGUI port redirect, even having it on different port than standard one, which is a good practice both for UI and OpenVPN services running on the box.

For OpenVPN use we need to create CA certificate to be ble to sign server certificate. Then create certificate. Next create OpenVPN server port pass rule. Then go to Services-OpenVPN and add new OpenVPN server with SSL/TLS + User Auth configuration on TCP IP4 only. Be sure to point server certificate, it is easy to miss that or select even some user certificate. Set tunnel network addressing for clients endpoints as well as local network which will be routed to the environment. It is good to have consistency for tunnel subnet numbering and also have set concurrent connections to some lower value to quickly identify possible misuse.

To use OpenVPN you will need a user with a user’s certificate. First create user and then create it’s user certificate. Now comes the quirky part, which is installing pfSense additional packages on outdated base system. There is an error message which leads us to System-Update setting. By trial-and-error I know that it is necessary to update system first, but it varies from version to version. This time neither UI upgrade nor console upgrade worked for me. Solution was to rehash certificates. On various previous versions there have been other solutions to similar compatibility issues. No doing it right could brick the system, so be sure to have backup before starting such troubleshooting session.

Finally after rehashing certificates we can proceed with upgrade. Without upgrade packages were not available. This upgrade process takes few minutes.

How to import OpenVPN user configuration into Ubuntu? Either by using GUI or nmcli utility. I find the latter easier and more stable across various previous Ubuntu versions. Even with imported configuration you still need to provide username, password and select Use this connection only for resources on its network. To use connections over OpenVPN interface we need to add appropriate passing rules. If you want to diagnose connectivity with ping then be sure to pass ICMP traffic as well. Check if OpenVPN client is connected, if you already created ICMP rule, and in case it still does not work, then reboot to shorten configuration apply time.

HAProxy

19:10: Proxmox server access can be achieved using HAProxy. First we define backend which is target server at port 8006. As we do not have options to load balance it at the moment it’s better to disable health checks. Secondly we define frontend at some custom port on LAN interface with TCP runtime type and backend set to what we have configured a moment ago. In settings we need to enable HAProxy, define maximum connections, set internal stats ports and lastly set max size of SSL DH parameter.

It’s good to clean up unused firewall rule.

We choose port 9443 for Proxmox UI and from now we need not to use public WAN interface to access it as there is tunneled OpenVPN connection available. Why do we even need to set HAProxy for Proxmox UI? Because Proxmox does not route itself thru pfSense which offers OpenVPN, there is need to access it either WAN, NAT-proxy or what we have just made which is HAProxy configuration.

Now, since we have a secured Proxmox UI connection, it’s time for setting up 2FA with TOTP. Some caveats concerning this one, if you want to create cluster and then add additional server to it (it must be empty, without VM/containers), then you need to disable it, add server to cluster and then enable it once again. Moreover most probably you would need to restart pvestatd service as it gets out of sync most of the time.

Suricata IDS/IPS security appliance

21:03: Interfaces list is empty on start, so need to create one, particularly for WAN interface. What is important here is to enable Block Offenders in Legacy model, blocking source IP. As for sensitivity of detect-engine I prefer setting either as Medium or High. In terms of rules and its categories there are two approaches. The first one tell that you should disable technical rules for flows etc, which will for sure decrease amount of false-positive blocks. However, by doing this we will welcome much more invalid traffic. So, there it is a second approach, where we leave this enabled and take care of false-positive blocks manually. I recommend enabling almost all of EP Open Rules categories. The most important thing here is to enter emerging-scan rules set and enable all nmap scan types as they are disabled by default for some unknown to me reason. This way we will be able to block the most popular scans. Same thing with emerging-icmp rules set.

To be able to comply with various security policies and standards it’s good to know that there is possibility to enlongen logs retention period for lets say 3 months. We can also send logs to remove syslog both configured from Suricata directly or in general pfSense settings also. At this point, we can enable, start Suricata, running on our WAN interface. On alerts tab we can monitor security notifications and also unblock source IP addresses here directly. On blocks tab, there is list of all currently blocked IP addresses, from where we can unblock them or see if this happen already in the past as there is a list of previous incidents.

It did not even take a minute a get some incidents on the list. One is from technical set of rules, like TCP packets validity verification and two others are from scan and spamhouse rules set from ET. This means that we have been both scanned and communicated from entity enlisted in Spamhaus directory, group 9 to be clear. Running Suricata on few dozens boxes for few months will give us millions of numerous alerts and blocks. Sometimes source IP addresses will be repeated between different locations meaning that either someone scans whole IPv4 blocks, target ISP or target exactly you as a person or organization running this boxes. On the alerts tab you can lookup reverse DNS for IP addresses as well as try to geo-locate them. It may or may not be useful information, depending on your needs to forensics analysis in your business case.

When interconnecting systems, either thru public interface or VPN tunnel it is common thing that system will cross-diagnose itself and put blocks. To avoid such a situation there is pass list tab, where you can enlist your systems public IP addresses to prevent from blocking. Once you have created some pass list containing external public IP addresses you can bind it to interface on interface edit page.

Ubuntu VM

24:44: Little overview, we purchased, rented a dedicated server, installed Debian, Proxmox, pfSense, HAProxy and now we face with creating Ubuntu VM which will later on hold a Docker Swarm master/worker node handled by Portainer CE orchestration utility. Creating new VM is straightforward as we go with basics. However concerning a Ubuntu server installation there are few options. First of all we can use HashiCorp Packer or even Terraform, as there is some provider for Proxmox 7. Moreover instead of manually create installable image we could create autoinstall or cloud-init configuration which will do it automatically. For the purpose of this video I will go manually. It’s important that every system drive at any provider including your home, should be secured by an encryption service such as LUKS. On this topic you can find some article on my blog. For the ease of use I prefer to manually enter LUKS password for decrypting drive. There are other options as well but require additional software or hardware.

After base system extraction and installation, there are some additional packages to be installed and updated fetched and apply also. It will take a few moments depending on the hardware of your choice.

Now the system installation is completed and we can eject ISO image and hit reboot an test LUKS decryption for the system drive.

Docker Swarm + Portainer

26:55: This section starts with Docker-CE service and tools installation as it is a vital part of Docker Swarm setup with Portainer used to orchestrate containers. When I look for some verified and well tested setup tutorials I often visit DigitalOcean website, which is great cloud computing provider I use for over a decade once I switched to it from Rackspace. Within that time around 2010s these companies have been starting well but have been overtaken (in the field of public cloud solutions) by Google, Microsoft and Amazon quite quickly and instead of hard-competing they decided to choose different business approaches to stay on the market.

Once we have installed Docker-CE service we can continue with initializing Docker Swarm. But before we can do it, there is need to manually create docker_gwbridge and ingress networks, just before you would run docker swarm init. Why is that you may ask? Well, it’s because Hetzner requires you to run MTU 1400 and Docker actually does not support changing MTU on-the-fly, it does not follow host network settings. With that said you can either leave network as they are, but do not use vSwitch or use host only port mapping which will make binding to host adapter instead of ingress. But in real world case it is good to have possibility to create Docker Swarm with more than one node, at least 3 master modes (uneven number required) and also 3 worker nodes. Such a setup can be spread across few different physical boxes, connected thru vSwitch in Hetzner which runs at 1 Gpbs at its maximum. Even if you do not plan to use similar setup it’s good to have MTU set to proper value because you would be struggling with services migrations after Docker Swarm it’s initialized and in production use.

Now regarding Portainer CE which is a UI for managing Docker containers, Swarm setups and many many. There is business edition with even more features available. Highly recommend it, especially if you look for smart and simple solutions for your containers and struggle with OpenShift/OKD/Kubernetes. I have commercial experience with all of them so I can tell you that there are some things that the duo Docker Swarm and Portainer lack, but those lacking things can be easily replaced with some other solutions. In case of the installation, you just download YML file with configuration of containers and so. As those things which are UI and agent will run within custom network, there is also a need to adjust MTU in this configuration file, before deploying.

It takes about a minute to download images and run them in Docker. UI is available on port 9443. Initially you create user with password and you are in. The primary environment is the local one and as Portainer runs with local volume it should stay on this particular node. Agent however is deployed with global parameter, which means that it will be deployed on every node in the Docker Swarm cluster which is available.

Now it is time for hello world container on our newly installed virtualization and containers environment. There are two types of testing images. First one is to check if anythings works at all and its called hello-world. It checks if Docker can pull images from the internet or other network and if container spawning works. As it does not have any hanging task, it will cycle after each and every run. So it is good to test with something more useful, as nginx for instance.

Nginx is a HTTP server, so it’s now worth setting up a port mapping which will utilize a internal Docker Swarm network called ingress. It take few seconds to download image and run it. We can check already in browser at the port of our choice if the service is actually running. And it is. However it is a local network connection accessible only over OpenVPN tunnel. To make this Nginx server accessible from the public internet we can use HAProxy. For low to mid traffic which is up to 20 thousand concurrent user will be fine to use single pfSense HAProxy instance as it’s enough in terms of performance, configuration etc. For bigger deployments I would recommend dynamic DNS such as one available on scaleway.com, multiple pfSense boxes with seperate public IP addresses and separate HAProxy either on pfSense or also separately.

This time, instead of TCP processing type as in case of Proxmox UI, we gonna use http / https (offloading) which would encrypt the traffic which is reverse proxied to backend server with no encryption as we do not provide encryption on nginx side at the moment. What is the benefit of having TLS offloaded onto pfSene box? First of all there is less configuration to manage as we either enter certificate in certificates store or use ACME with LetsEncrypt for instance. Second of all it just separates environment entrypoint from the system internals in the local network.

As we run HAProxy on WAN iterface now, there is need to create firewall rule for all the ports used in our setup. This test setup covers only unencrypted traffic. Encrypted traffic will also require HTTPS pass rule as well as redirect scheme on HAProxy frontends. It is as good practice in pfSense to have rules organized using separators, it makes just everything here much more clear. Be sure to either provide proper health check and have multiple backends or just disable this feature as you have only one backend.

Proxmox Backup Server

33:44: The last topic in our test setup is a backup, which is very important thing aside from functional features of this environment. We start with creating PBS VM with some basic settings. System is a Debian and it requires not so big root partition, but it requires quite a lot a memory which will be even more important if we have slow drives. Remember that encryption of backups happens on client side in Proxmox. PBS installations is straightforward without any quirks. For now we have only one drive which is for system use.

Once installation is done we add additional drive which will be used as a backup storage. Remember to uncheck backup checkbox. We can add drive live while the system is running. The easiest options is to create a directory which will be mounted with either ext4 of xfs filesystem. Depending on the drive size the initialization process will take from a minute up to 15 minutes or so.

In order to add PBS into Proxmox VE you need to go to Datacenter level – Storage and create new entry with PBS integration. Username requires adding @pam. You need to provide also a datastore name which we just created. Finally to authorize our integration you need to grab fingerprint from PBS UI. Once added, PBS integration with datastore will be present in storage section on choosen selected nodes.

To create backup job, go to Datacenter-Backup. You can choose between local backup and PBS backup. First we define what to backup and with what kind of schedule. Very important is to configure client side encryption which can be done editing PBS integration, on Encryption tab. Auto-generate key and secure it buy keeping it safe in at least 2 locations also being encrypted.

First we test backup job running backup directly on local drive which will be a local copy. It should run significantly faster than running thru PBS, so in some cases this could be a way to run backup, when you cannot lag memory or network. Moreover it is one of methods to have separated backups on different mediums. In case of course when you have PBS on separate drive which is the case here. Now we take cross-out first requirement from standard backup and data protection policy – to have separate physical copies.

Going a little bit further with the backup and the standards and policies thing is to create second backup server with also PBS. For sake of test of course we keep in the same location, but in real world case, it will be different physical location, even different Proxmox cluster. The purpose of having it is to do a backup in main PBS and synchronize it to the second PBS server. This way we can achieve various level of backup and data protection. We can differentiate backup retention on master and sync, deciding where to put more is where we have more space to dedicate for backups.

With the second PBS we create everything the same way. First new thing is to add Remote which is our source server from which we will pull backup chunks. Now with remote configured, we go to Sync Jobs tab where we define how often to synchronize and for how long we should keep those synchronized backups. Both for main and secondary backup server it is recommended to configure prune on servers side. Same applied for garbage collection tasks. Remember that GC keeps chunks for further 24 hours even then prune got rid of the backup itself. That’s the way it works, so with this in mind we need to consider proper capacity planning in terms of the number of copies kept.

Now we have backed-up locally, remotely and synchronized those backups to 3 medium. We should be safe for now, of course if sync server would be located outside the server.

Same as for Proxmox, here in PBS we can also configure 2FA TOTP.

Fixing Ubuntu 22 camera recording

I tried to record video on my Dell G15 laptop using built-in camera. Unfortunately, by default it does not work. I tried Cheese and Webcamoid using several different settings. Cheese did not work with both built-in and external camera. Webcamoid worked with external camera only. Finally I decided to try OBS-STUDIO and still built-in camera was crashing this piece of software also.

To fix this problem you could try to disable AppArmor:

sudo systemctl stop apparmor.service
sudo systemctl disable apparmor.service

If it is the case, then you can try to find which AppArmor profile is responsible for blocking camera recording. For me, it was the case. Now I can record using built-in camera.

Encrypt with LUKS an unencrypted LVM Ubuntu 22 Server without system reinstallation

Keep your data safe. Device loss or unauthorized access can be mitigated by encrypting drive in your server and workstation.

So you may have a Ubuntu Linux installation on your bare metal or virtual machine. Does it have an encrypted drive? If the answer is no, then you could be in trouble when device is stolen or lost, or someone just gained unauthorized access to your hardware. In this short step-by-step article you can see what steps your should take to encrypt your unencrypted drives without need to reinstall the system.

When speaking workstation, there is much less concern about system reinstallation. Just move out your data, configuration in favourable time and proceed with clean system installation, now with proper drive encryption configured. But hold on a second. If there is option to encrypt without need to reinstall your system, then why just not try it?

It is especially important when talking about server installations running production software handling customers data. You can opt for system replacement in maintainance window and redo all your work. But, sometimes it is not a option. Actually it does not matter what kind of you unmaintained and obsolote software you run on your unencrypted servers. I think most of us know at least one example of such a thing. With such a problematic software it would better to just do it as-is without additional steps required.

How to migrate data?

Here you can learn how to encrypt your existing data on LVM based drives in Ubuntu 22 server virtual machine. To get started you need to add additional drive to your system with equal or more space than on unencrypted drive. Lets say your source is at /dev/sda and your spare drive is at /dev/sdb. I assume that it is a default setup with 3 partitions. First one is for GRUB spacer, second for boot and thind for root filesystem.

Boot your VM with GRML ISO. On Proxmox, when VM is starting, press Esc and select disc drive with ISO mounted.

Once booted into GRML…

Create PV and extend existing VG with new drive:

pvcreate /dev/sdb
vgextend ubuntu-vg /dev/sdb

Move your data to new drive:

pvmove /dev/sda3 /dev/sdb

Get rid of existing source unencrypted drive from VG and remove PV:

vgreduce ubuntu-vg /dev/sda3
pvremove /dev/sda3

Now it’s time to wipe existing unencrypted drive:

crypt setup open --type plain -d /dev/urandom /dev/sda3 to_be_wiped
dd if=/dev/zero of=/dev/mapper/to_be_wiped bs=1M status=progress
cryptsetup close to_be_wiped

Now the most important aspect of the procedure. Create mountpoint for boot and write a LUKS header over there. It is critical to save this header on permanent storage. If header with keys is lost then your data will be lost also. Keep this in mind:“`

mkdir /mnt/boot
mount /dev/sda2 /mnt/boot

Encrypt and open drive container:

cryptsetup -y luksFormat /dev/sda3 --header /mnt/boot/luksheader.img
cryptsetup luksOpen /dev/sda3 lvmcrypt --header /mnt/boot/luksheader.img

Create new PV and include it into existing VG.

pvcreate /dev/mapper/lvmcrypt
vgextend ubuntu-vg /dev/mapper/lvmcrypt

Move your data from additional spare drive into newly created lvmcrypt container:

pvmove /dev/sdb /dev/mapper/lvmcrypt

And finally, remove this new drive from VG and remove PV itself:

vgreduce ubuntu-vg /dev/sdb
pvremove /dev/sdb

How to update initramfs with detached LUKS2 header

So what is the deal with this LUKS2 header being detached. In this format, first 16MB of space is used for header. Once the original drive includes this space it will not have enough proper space for data which needs to be moved back again. The second reason to have detached header is to increase security level somehow. But remember that in case of loss of the device or filesystem holding it you permamently loose your data also. So…

If booting once again into GRML:

mkdir /mnt/luks
cryptsetup luksOpen /dev/sdb3 lvmcrypt --header /mnt/boot/luksheader.img
vgchange -ay
mount /dev/mapper/ubuntu--vg-ubuntu--lv /mnt/luks

If you continue without rebooting you can just create mountpoint directory and mount it:

mkdir /mnt/luks
mount /dev/mapper/ubuntu--vg-ubuntu--lv /mnt/luks

Mount and bind necessary special directories and then chroot into the system:

mount -t proc proc /mnt/luks/proc
mount -t sysfs sys /mnt/luks/sys
mount -o bind /dev /mnt/luks/dev
mount --bind /run /mnt/luks/run
mount /dev/sda2 /mnt/luks/boot
chroot /mnt/luks /bin/bash

Now you are back in your Ubuntu, in your encrypted drive. Is it over? No. We need to tell at boot time where there LUKS2 header is stored. Copy your header into any additional drive found. It case of VM it could be 0.1GB drive. In case of workstation it can be a USB pendrive:

dd if=/boot/luksheader.img of=/dev/sdb

Edit your /etc/crypttab file with the following:

lvmcrypt PARTUUID=A none luks,header=/dev/disk/by-uuid/B

where A is your blkid /dev/sda3 and B is your blkid /dev/sdb.

Finally, regenerate initramfs:

update-initramfs -c -k all
exit
reboot

You’re good to go. Now your drive is encrypted and you will be asked for password set earlier every time you boot the system. To be clear, you need to keep safe your additional drive holding the LUKS2 header. After you booted the system, the drive or pendrive can be removed, but it need to be inserted once again on every further reboots.

Further reading

https://unix.stackexchange.com/questions/444931/is-there-a-way-to-encrypt-disk-without-formatting-it
https://www.michelebologna.net/2020/encrypt-an-existing-linux-installation-with-zero-downtime-luks-on-lvm/
https://dev.to/goober99/encrypt-an-existing-linux-installation-online-with-the-magic-of-lvm-1mjc
https://linuxconfig.org/how-to-use-luks-with-a-detached-header
https://medium.com/@privb0x23/lose-your-head-attempting-to-boot-from-luks-without-a-header-2d61174df360
https://askubuntu.com/questions/1351911/what-does-regenerate-your-initramfs-mean
https://superuser.com/questions/111152/whats-the-proper-way-to-prepare-chroot-to-recover-a-broken-linux-installation
https://unix.stackexchange.com/questions/720202/detached-luks-header-on-debian-based-gnu-linux

Create own shorcut icon in Ubuntu 22 panel

If you want to run some curl on icon click from Ubuntu 22 panel, then you need to create file in ~/.local/share/applications:

[Desktop Entry]
Encoding=UTF-8
Version=1.0
Type=Application
Terminal=false
Exec=curl http://x.x.x.x:yyyy/action
Name=Gateway
Icon=dialog-information

Then after few seconds, press special key (aka Windows key) and type your “application” name. You can grab and paste it onto application panel to be shown as a shortcut.