First Level of Optimization: 72.49% of CPU Power Required
The first step is similar to the one we do for license optimization. Moving databases to modern CPUs means you need fewer cores to run the same load at equal or better performance. Also, taking advantage of the cloud flexibility, you can provision database servers (RDS, EC2, or other) based on your peak load today. In contrast, buying DB systems on-prem requires over-provisioning with at least 3-5 years of growth in mind, as adding capacity to the data center is not trivial.
We also have other means to reduce the initial CPU requirements. Based on each system's SLA, we can provision a Pilot Light DR - instance, storage, or both. This means we can provision smaller instances in the DR region, leading to lower power consumption. In the case of planned or unplanned switchover, switching to a bigger instance takes a few minutes, and the process can be fully automated.
Using those techniques, we ended up with 30,122 target vCPU. This is 27.51% less carbon footprint than the original thread count - 751 MWh saved annually. And it's just the beginning.
Identification of Postgres targets
In the last two years, Cintra has built a vast experience in migrating from legacy Oracle RDBMS into Postgres. Our procedures, processes, and tools are constantly improving.
Starting from version 3.7 of our Rapid Discovery tool, we can run significantly improved Postgres migration diagnostics on any Oracle database. The new, enhanced functionality aligns with the AWS methodology. We detect and evaluate more than 60 unique aspects of the database workload.
As a result, our assessment services can identify and categorize targets with increasing levels of Postgres migration complexity:
Low complexity - High automation. Minor, simple manual conversions may be needed.
Medium complexity - Medium automation. Low-medium complexity manual conversions may be needed.
High complexity - Low automation. Medium-high complexity manual conversions may be needed. The migration is possible but not recommended by default because of such a project's high risk and cost.
Highest complexity. Or, as I call it, "forget-about-it" level of complexity. Here, we see databases using many Oracle-specific features or vast amounts of PL/SQL code. For example, we have seen a single database hosting more than 63 million lines of code.
Aside from assigning complexity scores to each database across big database estates, we can also point out the specific challenges each database will present (lots of non-compatible datatypes here, usage of particular packages there, etc.).
Second Level of Optimization: 59.15% of CPU Power Required
So, how does Postgres migration help sustainability? There are two main gains right after the migration.
First, Postgres has had virtually unlimited Multitenant capabilities since day one. This allows us to consolidate multiple smaller databases in one more efficient instance or cluster. There is no license required or artificial limitations. In contrast, Oracle - being Oracle - has put a usage limit to 3 PDBs per instance on non-Oracle public clouds. This is an excellent incentive for database modernization.
However, the more significant sustainability gain comes from the ability to run both RDS Postgres and Aurora Postgres on ARM-based CPUs. AWS Graviton-based instances use up to 60% less energy for the same performance than comparable EC2 instances.
Our powerful Postgres identification engine has put 5,178 databases (59,89% of the total estate) in the Low and Medium migration complexity. Using the Consolidation module based on our practical rules, we consolidate those to 2,810 instances with 11,074 cores. As those are Graviton instances, we can estimate power saving equal to 5,537 source cores (assuming a more conservative 50% power saving). This saves a further 364 MWh per year and reduces the CPU consumption to 59.16% of the initial number.
There are further potential gains that I am currently not factoring in. For example, 1% of the Postgres candidates are candidates for RedShift migration. This may sound like a relatively low number; but those are some of the biggest and busiest databases. However, such an architecture change needs further evaluation based on non-DB factors, so I am not putting this in the model.
Third Level of Optimization: 54.04% of CPU Power Required
One of the magic tricks Aurora has is being able to run in a serverless fashion. This is great for workloads with unpredictable or wildly variable demands. Running serverless means you run close to the number of CPUs you need (within certain limitations) and not the maximum you have provisioned. If a database goes idle for some configurable time (think of Dev/Test DBs during the night or on weekends), it can even go down to zero CPUs. The database service is resumed upon the first query to the endpoint.
One of the new data points we gather shows the "workload variability". It allows us to identify candidates where serverless makes sense from cost and power perspective.
Out of the databases identified as Aurora candidates, 50.09% have a variability of 2x or higher. Among this estate, the average saving from running Serverless is estimated 4.29x. In other words, a database like this, provisioned to meet the high demand constantly, will occupy, on average, 4.29 times more CPU cores than required. If we run it as serverless, the actual used CPU cores will go up and down based on the demand.
Adding this to the sustainability model, 50.09% of the Aurora candidates are provisioned as serverless. This means we save about 4,253 Graviton cores, reaching down to 1.474 GWh total CPU consumption per year, or 54.04% of the initial number.
Fourth Level of Optimization: 51.86% of CPU Power Required
AWS Elasticacle allows organizations to run enterprise-grade in-memory database caches in a fully managed fashion. Elasticache for Redis scales automatically; it can meet hundreds of millions of requests per second. One crucial aspect of the Redis engine is that it is open-source (no license required). AWS is among the major contributors to the project.
In the last few months, we have worked closely with the AWS Elasticache team to enable Rapid Discovery to identify the potential for Elasticache offloading among the assessed database estate.
We added more than a dozen new data points to be gathered during the initial discovery.
Based on the new data points, we have added specific business logic to identify potential candidates for Elasticache offloading.
My real-life experience has shown me how a critical enterprise database running on Exadata can benefit significantly from offloading the most popular queries to an in-memory cache, getting the CPU and/or IOPS requirements to half or even less. However, this is just experience and not precise measurement, and I want to base this article on real numbers.
Working with the AWS Elasticache team, we designed a test case to measure the CPU benefits precisely. The result shows that, for suitable workloads, offloading the most popular queries to Redis brings, on average, a 38.50% reduction in the demand for CPU. This includes both the Oracle and Elasticache cores in the target architecture. As a bonus, the average response time is significantly better (the improvement can be an order of magnitude). Also, the DB reliability is improved during high load times.
As per our statistics, among the database estate (both Oracle and Postgres), about 7.83% can benefit from Elasticache. This means 677 databases running on 2,358 target vCPUs can benefit from an average 38.50% improvement, saving 908 vCPUs - a further 120 MWh of power saved per annum. Adding this to the sustainability model, we reach 1.4 GWh total or 51.86% of the initial CPU carbon footprint.
(I did not factor in the usage of Graviton for Elasticache, so the actual number is even better.)