AI/ML

Generate images with Stable Diffusion, Gemma and WebUI on NVIDIA GPUs

With Ollama paired with Gemma3 model, Open WebUI with RAG and search capabilities and finally Automatic1111 running Stable Diffusion you can have quite complete set of AI features at home in a price of 2 consumer grade GPUs and some home electricity. With 500 iterations and image size of 512×256 it took around a minute to generate response. I find it funny to be able to generate images with AI techniques. Tried Stable Diffusion in the past, but now with help of Gemma and integratino with Automatic1111 on WebUI, it’s damn easy. Step by step Prerequisites You can find information

AI/ML

Run DeepSeek-R1:70b on CPU and RAM

Utilize both CPU, RAM and GPU computational resources With Ollama you can use not only GPU but also CPU with regular RAM go run LLM models, like DeepSeek-R1:70b. Of course you need to have fast both CPU and RAM and have plenty of it. My Lab setup contains 24 vCPU (2 x 6 cores * 2 threads) and from 128 to 384 GB of RAM. Once started, Ollama allocates 22.4GB in RAM (RES) and 119GB of vritual memory. It occupies 1200% CPU utilization causing system load to go up to 12. However, CPU utilization is only 50% in total. It

AI/ML

Ollama with Open WebUI on 2 x RTX 3060 12 GB

Ollama with WebUI on 2 “powerful” GPUs feels like commercial GPTs online I thought that Exo would do the job and utilize both of my Lab servers. Unfortunately, it does not work on Linux/NVIDIA with my setup and following official documentation. So I went back to Ollama and I found it great. I have 2 x NVIDIA RTX 3060 with 12GB VRAM each giving me in total 24GB which can run Gemma3:27b or DeepSeek-r1:32b. Ollama can utilize both GPUs in my system which can be seen in nvidia-smi. How to run Ollama in Docker with GPU acceleration you can read

AI/ML

Exo: the GPU cluster (tinygrad | MLX)

Theory: running AI workload spreaded across various devices using pipeline parallel inference In theory Exo provides a way to run memory heavy AI/LLM models workload onto many different devices spreading memory and computations across. They say: “Unify your existing devices into one powerful GPU: iPhone, iPad, Android, Mac, NVIDIA, Raspberry Pi, pretty much any device!“ People say: “It requires mlx but it is an Apple silicon-only library as far as I can tell. How is it supposed to be (I quote) “iPhone, iPad, Android, Mac, Linux, pretty much any device” ? Has it been tested on anything else than the

AI/ML

Object detection and scene description: various libraries/frameworks tested lately

No, cant use Tesla K20xm with 6GB VRAM for modern computation as it has Compute Capability parameter lower than required 7.0. Here you have table of my findings about libraries/frameworks, required hardware and its purpose. I started with DeepStack, where I was able to run API server for object detection, Frigate has support for it. Later on, with TensorRT on NVIDIA GPU I can run Yolov7x-640 model also for object detection, Frigate works well with it. With Google Coral TPU USB module we can run SSD MobileNet or EfficientDet models with great power efficency for good price. Ollama with moondream

AI/ML

Qwen LLM

What is Qwen? This is the organization of Qwen, which refers to the large language model family built by Alibaba Cloud. In this organization, we continuously release large language models (LLM), large multimodal models (LMM), and other AGI-related projects. Check them out and enjoy! What models do they provide? They provide wide range of models, since 2023. Original model was just called Qwen and can be still found on GitHub. The current model Qwen2.5 has its own repository, also on GitHub. General purpose models are just Qwen, but there are also code specific models. There are also Math, Audio and

Technology

Install Proxmox on Scaleway using Dell’s iDRAC

Last time (somewhere around 2023) there was an option on Scaleway to install Proxmox 7 directly from appliance. There was also possiblity to use Debian 11 and install Proxmox atop of it. This time (2024/Nov) there is no direct install and installing on Debian gives me some unexpected errors which I do not want to overcome as it should work just like that. But there is option to use Dell’s iDRAC interface for remote access.

AI/ML

NVIDIA CC 7.0+: how to run Ollama/moondream:1.8b

Well, in one of the previous articles I described how to invoke Ollama/moondream:1.8b using cURL, however I forgot to tell how to even run it in Docker container. So here you go: You can specify to run particular model in background (-d) or in foreground (without parameter -d). You can also define parallelism and maximum queue in Ollama server: One important one regarding stability of Ollama server. Once it runs for over few hours there might be issue with GPU driver which requires restart, so Ollama needs to be monitored for such scenario. Moreover after minutes of idle time it

AI/ML

OpenVINO in AI computer vision object detection (Frigate + OpenVINO)

What is OpenVINO? “Open-source software toolkit for optimizing and deploying deep learning models.” It is developed by Intel since 2018. It supports LLM, computer vision and generative AI. It runs on Windows, Linux and MacOS. As for Ubuntu, it is recommended to run on 22.04 LTS and higher. It utilizes OpenCL drivers. In theory, libraries using OpenCL (such as OpenVINO) should be cross platform contrary to vendor-locked similar solutions like CUDA. In theory OpenVINO should then work on both Intel and AMD hardware. Internet says that is works, but as for now I need to order some additional hardware to