IP2Location – complete IPv4 ranges ingestion
IP2Location is a IP address databases where you can find the latest IP to location associations. The complete IPv4 range is 4 294 967 296 addresses which is 32 bit. IP2Location contains 4 291 944 710 addresses which is a little less. However as much as 608 487 295 addresses come with no location set. It is because of:
- 0.0.0.0/8 (local)
- 10.0.0.0/8 (class A)
- 100.64.0.0/10 (shared)
- 127.0.0.0/8 (host)
- 169.254.0.0/16 (link-local)
- 172.16.0.0/12 (class B)
- 192.0.0.0/24 (dual-stack)
- 192.0.2.0/24 (documentation etc)
- 192.88.99.0/24 (reserved, IPv6 to IPV4)
- 192.168.0.0/16 (class C)
- 198.18.0.0/15 (benchmarking)
- 198.51.100.0/24 (documentation etc)
- 203.0.113.0/24 (documentation etc)
- 224.0.0.0/4 (multicast, class D)
- 233.252.0.0/24 (documentation etc)
- 240.0.0/4 (reserved, class E)
- 255.255.255.255/32 (broadcast)
Above should not have location set as those are special IPv4 ranges. So valid commercial (and non-commercial also) use IPv4 addresses count should be somewhere near value of 3 683 457 415, which is all addresses minus addresses without location. DARPA developed IPv4 started in 1981 and has been exhausted in 2011.
How does this apply to Cassandra databases?
Complete 3.6B addresses weights around 40GB of data in single Apache Cassandra 5.0 node. Using commodity hardware with Intel i5 10200H and Western Digital SN530 NVME drive we can get up to 29k inserts per second. Doing the math the complete ingestion should finish within 35 hours. However if we would put multiple Cassandra nodes to split writes it most probably be much faster. Lets say we would run at 100k/s so ingestion time would be 10 hours. With 1M/s this would run for only 1 hour.