NGINX – MICHAŁ SOBCZAK

AI/ML

Block AI web-scrapers from stealing your website content

2024-09-15 2 Min Reading

Did you know that you may block AI-related web-scrapers from downloading your whole websites and actually stealing your content. This way LLM models will need to have different data source for learning process! Why you may ask? First of all, AI companies make money on their LLM, so using your content without paying you is just stealing. It applies for texts, images and sounds. It is intellectual property which has certain value. Long time ago I placed on my website a license “Attribution-NonCommercial-NoDerivatives” and guest what… it does not matter. I did not receive any attribution. Dozens of various bot

Technology

Conditional Nginx logging

2023-04-282023-04-28 1 Min Reading

Logging all HTTP traffic is often unnecessary. It especially applies to website which include not only text content but also all kind of additional components, like JavaScripts, stylesheets, images, fonts etc. You can select what you would like to log inclusively, but it is much easier to do this by conditional negative selection. First define log format, then create conditional mapping, last thing is to specify logger with decision variable. For instance: This way we are not going to log any of additional stuff and keep only regular pages in the log. Will be more useful for further traffic analysis

Technology

Geo location with Filebeat on Elasticsearch 7, HAProxy and NGINX

2022-09-232022-09-23 5 Min Reading

Display geo location map for NGINX traffic logs in Kibana Summary There are 3 things to remember and configure in order to have geo location map working: Use “forwardfor” option on pfSense HAProxy TLS frontend Enable filebeat NGINX module and point particular log files Define custom NGINX log format This guide relates to Ubuntu Linux setup. Elasticsearch 7 First install Elasticsearch 7 as follows. Note: for more resilent setup install more than one Elasticsearch server node and enable basic security. For sake of clarity I will skip these two aspects which will be covered by another article. Kibana Then install

Technology

WordPress quirks and features

2022-09-14 1 Min Reading

I will start with the werid experience with one of WP themes – Polite and Polite Grid. I was wondering why my website make double requests on every page. One for the document and other for content. This was annoying as I was unable to measure traffic properly. It turned out that it was because of the theme I’ve been using for some time. Changing it to different one fixed it. Second of all to make NGINX logs easier to handle I’ve created separate location entry for all the WP things, so the “real” traffic goes only to particular log

Technology

min.io server behind NGINX reverse-proxy

2022-08-312022-09-06 2 Min Reading

The most recent min.io server release requires one additional thing in the configuration comparing to versions in the past years. Having min.io on one box and NGINX on another one requires setting up a reverse proxy, which is straightforward operation. You need to remember to add proper headers to pass hostname and schema to min.io box. This whole thing is described in the documentation. But… you are required to put the following into a min.io configuration file: This should be put in bold letters beause without this one you could upload artifacts into buckets, but will not be able to

Tag: NGINX

Block AI web-scrapers from stealing your website content

Conditional Nginx logging

Geo location with Filebeat on Elasticsearch 7, HAProxy and NGINX

WordPress quirks and features

min.io server behind NGINX reverse-proxy