Cloud Engineering weeknotes, 21 January 2022

A quiet week; just as well, as we have been a few people down this week, making our already-small team even smaller. 

We’ve still got some stuff done, mostly on the support side of things – permissions, DNS changes, and restoring the connection between an EC2 and an S3 for the Document Migration team. We also had a demo of the new HaloITSM system; we will be using this for all support requests from 31 January so please note that requests via Slack will not be picked up from that date. 

Tomasz and Cintia continue work on the firewalls, and specifically on Globalprotect. The work to split Globalprotect into two for “internal” and “external” applications is nearing completion. Cintia has also been supporting Frank with networking on the production Ansible infrastructure for websites. We always learn more than we think we have. 

The other main piece of work this week has been preparations to replace the wildcard SSL certificate. It expires in a few weeks, and we want to replace it with AWS-issued certificates, which will renew automatically. Thanks to AWS Config, Matt was able to track down all usage of the wildcard in AWS in record time. We’re now planning how and when to do the replacement exercise.

However, AWS Certificate Manager doesn’t allow exports, so to deal with services outside AWS that uses our certificate, we’ll use a different method. We did a lunch and learn on this during the week. 

Although our plans for this sprint are to clear out the work in progress, much remains blocked. One of our suppliers is being slow to respond to a request, which in turn is delaying important work for Repairs Hub. We think this is now unblocked, but a second supplier is also being unresponsive on work for Social Care. This will be escalated. 

Cloud Engineering weeknotes, 14 January 2022

This week demonstrated how the team has grown and matured; we abandoned our sprint. This is a rare event for an agile team, but it’s a sign that the team just knew that what was there wasn’t going to bring value, so we abandoned it for things that would actually bring value. 

We focused on two main things. First, the firewalls and Globalprotect. We’ve implemented new authentication groups in Globalprotect; this is part of splitting Globalprotect into internal and external for apps hosted in our own AWS or SaaS. The IP ranges have also been corrected, as the 172 range we had been using was an antipattern. 

The big change in the firewalls is the implementation of a VPN to secure access to the management console.  This increases the security around who can access the firewalls while also simplifying that security. The team got a crash course in how the firewall routing is configured on Wednesday, when we mobbed on an issue connected to the websites – the other main thing of the week.

The new Ansible infrastructure wasn’t communicating properly with the Hub so we had a big screen-sharing session with Cintia driving the work on the firewalls while Frank and Stuart debugged the Ansible and everyone else learned a lot. We’ve agreed we should do more sessions like this. 

There have been some other support-related tasks this week, such as supporting Public Health with a bulk file transfer, and setting up a way for Pensions to download data from S3. But a lot of our other planned work is still blocked. We had planned to roll the Housing-Production account onto the Hub last night, but found out in the late afternoon that a SaaS supplier hadn’t put the new IP addresses on their allow-list. This would have led to severe disruption elsewhere in the council, so we have postponed the change. 

In an effort to “clear the decks” of our work in progress and our blocked work, we have cleared out our backlog and in our next sprint are focusing only on those tasks in hand, plus a couple of new essential tasks. The more we can clear the slate, the more we will be able to focus on bringing value to the platform and its users. 

Cloud Engineering weeknotes, 7 January 2022

With the code freeze extending into the first week after Christmas, it’s been a quiet week. Or it was going to be, until a routine WordPress update went wrong. The team swarmed on this issue this morning, and as at the time of writing, all affected websites have been restored.

Before that happened, the team was regrouping after the break. We started with refreshing our memories of what we had been doing before Christmas, especially where that work had been handed over from one of our departed colleagues. However, because the bulk of that work – GIS apps migrations, account migrations – relate to production, we won’t be doing it till next week. 

We had a little reboot session in the week to pick up on some issues in our last retro. The outcome of this is that we’re going to start reserving time for learning each week – a Friday afternoon, though we will still respond to urgent issues, of course – and we will start to have a day in the office once a month. We’re also looking at how we can spread DevOps skills outside our team… more on this as we solidify ideas. It will take a little while for us to work out what our capacity is now, but this first week back has been a (relatively) gentle entry into the new year, with a new team shape. 

Cloud Engineering weeknotes, 23 December 2021

A short week, a quiet week, a week where we’re trying to tie up loose ends and be ready for January. This started with the account migrations; we’ve now moved Housing-Staging onto the Hub, and will move Housing-Prod after the code freeze. The GIS apps migration is in a similar position, with the staging environment having been tested and all found working. Although we had wanted to do production before Christmas, deploying on a Friday is never a good idea – and an even worse idea when that Friday is Christmas Eve. 

The websites migration is also going well, with all staging sites moved and working well. We’ve been able to identify some sites that can be shut down (with a backup held), and we will be in a good position to move the production sites after the freeze. That will leave just the API accounts and we will finally be done. 

There has been other work this week. We’ve restricted the ability to create certain networking resources, and have implemented AWS Federation for GitHub Actions. Both of these changes will improve our security. 

We’ve also done some work on costs. We’ve helped two teams identify high or unusual costs in their accounts, and they’re taking action to address that. We’re also working on allowing access to the staging version of Mosaic through Globalprotect; at the moment this is in AppStream and so before testing ramps up, we can avoid a lot of spend. It’s always good to save money. 

We say goodbye to most of our colleagues from Digi2al this week. We genuinely couldn’t have done this work without them, and we have all learned so much over the last year. Alliu, James, Tom, and Zoli, thank you so much for everything you’ve done. You will be missed. 

Cloud Engineering weeknotes, 17 December 2021

Another week of lots of under-the-hood progress, especially on the account migrations. Having been blocked for so long, the path is now clearing.

The work we needed on the application layer in the Housing accounts has been completed, and huge thanks to the MTFH Finance team for prioritising that. This means we are now able to roll both Staging and Production onto the Hub. The GIS apps are in a similar state: Staging apps have been migrated and the final testing is being done. We just need to move Production. 

Once these final migrations are done, we will be able to move the API accounts to the Hub later in January. We’ve identified some more business applications in the API accounts, and they can move later into their own account. On top of this, the Website migrations are going well, with the Staging version of Find Support Services deployed and working in the new Ansible container. We’ll move the remaining Staging versions shortly – performance improvements mean a deployment is now taking just 90 seconds. 

Beyond that, we’ve been plugging away on Globalprotect, documentation, and support. On Globalprotect we have hit a few bumps on having separate instances for internal and external applications, but we’re getting there with a few tricks to try yet. Progress on authentication groups is happening again, thanks to Mario for his work here. 

We are now getting ready for most of our Digi2al colleagues to roll off the project, so there’s a lot of close working on the migrations and a lot of documentation going on. We’ve done a lot over the last year and it’s time for a break, but please do be aware that from January we will have a smaller team. As a result, we will be going a bit more slowly, so please give us as much notice as possible if you need something from us.