Tag: Cassandra

Technology

IP2Location – complete IPv4 ranges ingestion

IP2Location is a IP address databases where you can find the latest IP to location associations. The complete IPv4 range is 4 294 967 296 addresses which is 32 bit. IP2Location contains 4 291 944 710 addresses which is a little less. However as much as 608 487 295 addresses come with no location set. It is because of: Above should not have location set as those are special IPv4 ranges. So valid commercial (and non-commercial also) use IPv4 addresses count should be somewhere near value of 3 683 457 415, which is all addresses minus addresses without location. DARPA

Technology

Cassandra performance tuning

From 8k to 29k writes per second We took IP2Location version DB11 database. It holds few millions of IPv4 ranges which should unwrap onto over 2 billion addresses. Such number of entries is actually not a big deal for PostgreSQL RDBMS or Apache Cassandra distributed databases system. However there is an issue of ingestion speed. The question is how quick I can programmatically compute IP addresses for IP ranges and insert them in persistant storage. PostgreSQL can hold easily around 10TB of data in single node. It can hold even more especially if divided into separate partitions/tables or use multiple

Technology

Casssandra: the introduction

Distributed, partitioned, multi-master, increment horizontal scale-out NoSQL data manegement system for global mission-critical use cases handling petabyte sized datasets Dynamo, Amazon, Facebook, Apache… “Reliability at massive scale is one of the biggest challenges we face at Amazon“ source: “Dynamo: amazon’s highly available key-value store” In 2004 there have been performance issues in Amazon e-commerce handling due to high traffic. By 2007 the concept of Dynamo has been materialized as Amazon S3. Then in 2008 Facebook with co-authors of Amazon Dynamo developed its own distributed NoSQL system, Cassandra. In 2009 Cassandra became Apache’s project. However, Amazon’s DynamoDB came in 2012, it