Did you know that you may block AI-related web-scrapers from downloading your whole websites and actually stealing your content. This way LLM models will need to have different data source for learning process! Why you may ask? First of all, AI companies make money on their LLM, so using your content without paying you is just stealing. It applies for texts, images and sounds. It is intellectual property which has certain value. Long time ago I placed on my website a license “Attribution-NonCommercial-NoDerivatives” and guest what… it does not matter. I did not receive any attribution. Dozens of various bot
Logging all HTTP traffic is often unnecessary. It especially applies to website which include not only text content but also all kind of additional components, like JavaScripts, stylesheets, images, fonts etc. You can select what you would like to log inclusively, but it is much easier to do this by conditional negative selection. First define log format, then create conditional mapping, last thing is to specify logger with decision variable. For instance: This way we are not going to log any of additional stuff and keep only regular pages in the log. Will be more useful for further traffic analysis
Display geo location map for NGINX traffic logs in Kibana Summary There are 3 things to remember and configure in order to have geo location map working: Use “forwardfor” option on pfSense HAProxy TLS frontend Enable filebeat NGINX module and point particular log files Define custom NGINX log format This guide relates to Ubuntu Linux setup. Elasticsearch 7 First install Elasticsearch 7 as follows. Note: for more resilent setup install more than one Elasticsearch server node and enable basic security. For sake of clarity I will skip these two aspects which will be covered by another article. Kibana Then install
I will start with the werid experience with one of WP themes – Polite and Polite Grid. I was wondering why my website make double requests on every page. One for the document and other for content. This was annoying as I was unable to measure traffic properly. It turned out that it was because of the theme I’ve been using for some time. Changing it to different one fixed it. Second of all to make NGINX logs easier to handle I’ve created separate location entry for all the WP things, so the “real” traffic goes only to particular log
The most recent min.io server release requires one additional thing in the configuration comparing to versions in the past years. Having min.io on one box and NGINX on another one requires setting up a reverse proxy, which is straightforward operation. You need to remember to add proper headers to pass hostname and schema to min.io box. This whole thing is described in the documentation. But… you are required to put the following into a min.io configuration file: This should be put in bold letters beause without this one you could upload artifacts into buckets, but will not be able to