uncategorized

About me

Hello. I’m Michael, I’m IT professional and enthusiast. I graduated computer programming as well as economics at Szkoła Główna Handlowa in Warsaw. I’m author of 5 books concerning software design, development, quality assurance and performance. Since 2005 I have been working with many different companies providing them with various aspects of software development process. I’m specifically interested in corporate architecture but also in bootstraping new startup ideas. My motto is “getting things done” and I like to learn new things. Would you like to build something special? Contact me.

Technology

IP2Location – complete IPv4 ranges ingestion

IP2Location is a IP address databases where you can find the latest IP to location associations. The complete IPv4 range is 4 294 967 296 addresses which is 32 bit. IP2Location contains 4 291 944 710 addresses which is a little less. However as much as 608 487 295 addresses come with no location set. It is because of: Above should not have location set as those are special IPv4 ranges. So valid commercial (and non-commercial also) use IPv4 addresses count should be somewhere near value of 3 683 457 415, which is all addresses minus addresses without location. DARPA

Technology

Cassandra performance tuning

From 8k to 29k writes per second We took IP2Location version DB11 database. It holds few millions of IPv4 ranges which should unwrap onto over 2 billion addresses. Such number of entries is actually not a big deal for PostgreSQL RDBMS or Apache Cassandra distributed databases system. However there is an issue of ingestion speed. The question is how quick I can programmatically compute IP addresses for IP ranges and insert them in persistant storage. PostgreSQL can hold easily around 10TB of data in single node. It can hold even more especially if divided into separate partitions/tables or use multiple

AI/ML

Run Bielik LLM from SpeakLeash using LM Studio on your local machine

Did you know that you can use the Polish LLM Bielik from SpeakLeash locally, on your private computer? The easiest way to do this is LM Studio (from lmstudio.ai). Why use a model locally? Just for fun. Where we don’t have internet. Because we don’t want to share our data and conversations etc… You can run it on macOS, Windows and Linux. It requires support for AVX2 CPU instructions, a large amount of RAM and, preferably, a dedicated and modern graphics card. Note: for example, on a Thinkpad t460p with i5 6300HQ with a dedicated 940MX 2GB VRAM card basically

Technology

Recover pfSense 2.6 from kernal panic at ZFS freeing free segment

Recently my pfSense running on the same hardware for almost 3 years, died. I tried rebooting it and removing RAM, cards etc, with no luck. So decided to bring it back from configuration backup onto new drive. But after few days I stared investigating this matter and I got some temporary solution to start it back. Here is how kernel panic looks like. It says: “Attempt to query device size failed” and “zfs: freeing free segment”. The latter is the cause of the problem with system starting up. First, select “3” to escape to loader prompt: Then set: And you

Technology

Casssandra: the introduction

Distributed, partitioned, multi-master, increment horizontal scale-out NoSQL data manegement system for global mission-critical use cases handling petabyte sized datasets Dynamo, Amazon, Facebook, Apache… “Reliability at massive scale is one of the biggest challenges we face at Amazon“ source: “Dynamo: amazon’s highly available key-value store” In 2004 there have been performance issues in Amazon e-commerce handling due to high traffic. By 2007 the concept of Dynamo has been materialized as Amazon S3. Then in 2008 Facebook with co-authors of Amazon Dynamo developed its own distributed NoSQL system, Cassandra. In 2009 Cassandra became Apache’s project. However, Amazon’s DynamoDB came in 2012, it

Technology

PostgreSQL manual partitioning

Have you ever wondered how many tables can we create and use in PostgreSQL database server? Shall we call them partitions or shards? Why not to use built-in “automatic” partitioning? Partitions or shards? Lets first define the difference between partitions and shards. Partitions are placed on the same server, but shards can be spread across various machines. We can use inheritance or more recent “automatic” partitioning. However both of these solutions lead to tight join with PostgreSQL RDBMS, which in some situations we would like to avoid. Imagine a perspective of migrating our schemas to different RDBMS like Microsoft SQL

AI/ML

Block AI web-scrapers from stealing your website content

Did you know that you may block AI-related web-scrapers from downloading your whole websites and actually stealing your content. This way LLM models will need to have different data source for learning process! Why you may ask? First of all, AI companies make money on their LLM, so using your content without paying you is just stealing. It applies for texts, images and sounds. It is intellectual property which has certain value. Long time ago I placed on my website a license “Attribution-NonCommercial-NoDerivatives” and guest what… it does not matter. I did not receive any attribution. Dozens of various bot

Technology

How to build computer inside computer?

Even wondered how computer is built? And no, I’m not talking about unscrewing your laptop… but exactly how the things happen inside the CPU. If so, then check out TINA from Texas Instruments and open my custom-made all-in-one computer. I spend few weeks preparing this schematic. It contains clock, program counter, memory address register, RAM, ALU, A&B registers, instruction register, microcode decoder, instruction register, address register and program counter. Well that’s a lot ot stuff you need to build 8-bit data and 4-bit address computer, even in simulator. Sample program in my assembly + binary representation, which needs to be