uncategorized

About me

Hello. I’m Michael, I’m IT professional and enthusiast. I graduated computer programming as well as economics at Szkoła Główna Handlowa in Warsaw. I’m author of 5 books concerning software design, development, quality assurance and performance. Since 2005 I have been working with many different companies providing them with various aspects of software development process. I’m specifically interested in corporate architecture but also in bootstraping new startup ideas. My motto is “getting things done” and I like to learn new things. Would you like to build something special? Contact me.

Technology

Cassandra performance tuning

From 8k to 29k writes per second We took IP2Location version DB11 database. It holds few millions of IPv4 ranges which should unwrap onto over 2 billion addresses. Such number of entries is actually not a big deal for PostgreSQL RDBMS or Apache Cassandra distributed databases system. However there is an issue of ingestion speed. The question is how quick I can programmatically compute IP addresses for IP ranges and insert them in persistant storage. PostgreSQL can hold easily around 10TB of data in single node. It can hold even more especially if divided into separate partitions/tables or use multiple

AI/ML

Run Bielik LLM from SpeakLeash using LM Studio on your local machine

Did you know that you can use the Polish LLM Bielik from SpeakLeash locally, on your private computer? The easiest way to do this is LM Studio (from lmstudio.ai). Why use a model locally? Just for fun. Where we don’t have internet. Because we don’t want to share our data and conversations etc… You can run it on macOS, Windows and Linux. It requires support for AVX2 CPU instructions, a large amount of RAM and, preferably, a dedicated and modern graphics card. Note: for example, on a Thinkpad t460p with i5 6300HQ with a dedicated 940MX 2GB VRAM card basically

Technology

Recover pfSense 2.6 from kernal panic at ZFS freeing free segment

Recently my pfSense running on the same hardware for almost 3 years, died. I tried rebooting it and removing RAM, cards etc, with no luck. So decided to bring it back from configuration backup onto new drive. But after few days I stared investigating this matter and I got some temporary solution to start it back. Here is how kernel panic looks like. It says: “Attempt to query device size failed” and “zfs: freeing free segment”. The latter is the cause of the problem with system starting up. First, select “3” to escape to loader prompt: Then set: And you

Technology

Casssandra: the introduction

Distributed, partitioned, multi-master, increment horizontal scale-out NoSQL data manegement system for global mission-critical use cases handling petabyte sized datasets Dynamo, Amazon, Facebook, Apache… “Reliability at massive scale is one of the biggest challenges we face at Amazon“ source: “Dynamo: amazon’s highly available key-value store” In 2004 there have been performance issues in Amazon e-commerce handling due to high traffic. By 2007 the concept of Dynamo has been materialized as Amazon S3. Then in 2008 Facebook with co-authors of Amazon Dynamo developed its own distributed NoSQL system, Cassandra. In 2009 Cassandra became Apache’s project. However, Amazon’s DynamoDB came in 2012, it

Technology

PostgreSQL manual partitioning

Have you ever wondered how many tables can we create and use in PostgreSQL database server? Shall we call them partitions or shards? Why not to use built-in “automatic” partitioning? Partitions or shards? Lets first define the difference between partitions and shards. Partitions are placed on the same server, but shards can be spread across various machines. We can use inheritance or more recent “automatic” partitioning. However both of these solutions lead to tight join with PostgreSQL RDBMS, which in some situations we would like to avoid. Imagine a perspective of migrating our schemas to different RDBMS like Microsoft SQL

AI/ML

Block AI web-scrapers from stealing your website content

Did you know that you may block AI-related web-scrapers from downloading your whole websites and actually stealing your content. This way LLM models will need to have different data source for learning process! Why you may ask? First of all, AI companies make money on their LLM, so using your content without paying you is just stealing. It applies for texts, images and sounds. It is intellectual property which has certain value. Long time ago I placed on my website a license “Attribution-NonCommercial-NoDerivatives” and guest what… it does not matter. I did not receive any attribution. Dozens of various bot

Technology

How to build computer inside computer?

Even wondered how computer is built? And no, I’m not talking about unscrewing your laptop… but exactly how the things happen inside the CPU. If so, then check out TINA from Texas Instruments and open my custom-made all-in-one computer. I spend few weeks preparing this schematic. It contains clock, program counter, memory address register, RAM, ALU, A&B registers, instruction register, microcode decoder, instruction register, address register and program counter. Well that’s a lot ot stuff you need to build 8-bit data and 4-bit address computer, even in simulator. Sample program in my assembly + binary representation, which needs to be

AI/ML

BLOOM LLM: how to use?

Asking BLOOM-560M “what is love?” it replies with “The woman who had my first kiss in my life had no idea that I was a man”. wtf?! Intro I’ve been into parallel computing since 2021, playing with OpenCL (you can read about it here), looking for maximizing devices capabilities. I’ve got pretty decent in-depth knowledge about how computational process works on GPUs and I’m curious how the most recent AI/ML/LLM technology works. And here you have my little introduction to LLM topic from practical point-of-view. Course of Action What is BLOOM? It is a BigScience Large Open-science Open-access Multilingual language