Cloud Native

It all started with a Well-Architected Review

The Getback platform from adfocus is an onsite technology that helps you raise your conversion rate via specific measures tailored to your e-commerce solution. It allows you to track your website visitors' behaviour and provides them with the right cue, supporting service and suitable action at the right time. Furthermore, Getback web and other push notifications also allow you to contact your visitors directly and send them a browser message at any time

By Wuming Zhang, Cloud Architect

9 March 2021

Swisscom’s cloud experts had the opportunity to support adfocus in reviewing the Getback workload by applying the Well-Architected Framework. During a half-day review, we were able to identify several areas of improvement. In the remediation session, we’ve managed to help adfocus to introduce the multi-account framework with AWS Organisation, separating development from production and in addition, replacing bastion host and encryption advancement with the system manager. By the end of the well-architected remediation, the Getback workload comes out in excellent shape, except for one central part of the system: the main database. Over the years, Getback managed to accumulate an enormous amount of data. Some of the tables are nearly a billion rows, which shows that the database had outgrown its architecture. This is particularly visible in the area of failover and high availability.

We support you in migrating to AWS Aurora database

Every database migration must go through the standard data migration phases, such as analyse compatibility, schema validation etc. Two challenges worth mentioning are the speed at which we will be able to dump and reload over a billion rows for over a billion rows (nearly 1Tb) and planning a migration that would allow us to react to unexpected situations by rolling forward to a safe environment with minimal downtime and no data loss.

We spend quite some time figuring out how to load data into Aurora faster. The usual bottleneck is the data export/import. Now we know, that the data import needs to finish before the binlog is rotated. After exploring different options on our own, we turned to AWS support for a second opinion, and they referred us to mydumper and myloader. By exploiting the elasticity of AWS and RDS Aurora in particular, we were able to scale up and finish the data load within hours instead of days, and then scale it back to normal usage.

The second challenge of live migration and roll-forward in case of a disaster is a rather complicated task. First, we had to create our new aurora database as a replica of the current database. Once it had caught up with the replication, we created another database with the old DB engine as a replica of the new Aurora database. See as the diagram displays. It looks simple on PowerPoint, but in reality there are bits & bytes and some dark magic. Thanks to some extensive testing and planning, we managed to perform the live migration without obstacles, and after a couple of days the roll-forward was removed as well.

Is Aurora any better? By how much?

On the performance, the response time of P95 request went from 50ms to 9ms. That’s an improvement of 450%. Previous queries that took 9 hours now only need 31 minutes to complete.

Cost remains about the same due to Aurora cost counts on CPU (machine) and IO separately.

Reliability is significantly higher. We use a writer node and a reader node, where the latter can take over when needed. This improves uptime dramatically.

Also security has improved significantly using encryption and IAM authentication. This allowed adfocus to no longer use credentials, which could end up being leaked by accident.

Furthermore, operations were given a boost as well, with managed backup and performance insight. We can now promptly identify DB queries issue, clone it and examine our improvement in the testing environment.

Get your North Star

The Well-Architected Framework, with its reviews and remediation activity, can act as your north star. In this migration work, we continue to rely on well-architected design principles, such as stop guessing the capacity needs, testing the production scale, and enabling evolutionary architectures. We are proud of our customer testimonials.