Cloud Engineering weeknotes, 14 January 2022

This week demonstrated how the team has grown and matured; we abandoned our sprint. This is a rare event for an agile team, but it’s a sign that the team just knew that what was there wasn’t going to bring value, so we abandoned it for things that would actually bring value. 

We focused on two main things. First, the firewalls and Globalprotect. We’ve implemented new authentication groups in Globalprotect; this is part of splitting Globalprotect into internal and external for apps hosted in our own AWS or SaaS. The IP ranges have also been corrected, as the 172 range we had been using was an antipattern. 

The big change in the firewalls is the implementation of a VPN to secure access to the management console.  This increases the security around who can access the firewalls while also simplifying that security. The team got a crash course in how the firewall routing is configured on Wednesday, when we mobbed on an issue connected to the websites – the other main thing of the week.

The new Ansible infrastructure wasn’t communicating properly with the Hub so we had a big screen-sharing session with Cintia driving the work on the firewalls while Frank and Stuart debugged the Ansible and everyone else learned a lot. We’ve agreed we should do more sessions like this. 

There have been some other support-related tasks this week, such as supporting Public Health with a bulk file transfer, and setting up a way for Pensions to download data from S3. But a lot of our other planned work is still blocked. We had planned to roll the Housing-Production account onto the Hub last night, but found out in the late afternoon that a SaaS supplier hadn’t put the new IP addresses on their allow-list. This would have led to severe disruption elsewhere in the council, so we have postponed the change. 

In an effort to “clear the decks” of our work in progress and our blocked work, we have cleared out our backlog and in our next sprint are focusing only on those tasks in hand, plus a couple of new essential tasks. The more we can clear the slate, the more we will be able to focus on bringing value to the platform and its users. 

Cloud Engineering weeknotes, 7 January 2022

With the code freeze extending into the first week after Christmas, it’s been a quiet week. Or it was going to be, until a routine WordPress update went wrong. The team swarmed on this issue this morning, and as at the time of writing, all affected websites have been restored.

Before that happened, the team was regrouping after the break. We started with refreshing our memories of what we had been doing before Christmas, especially where that work had been handed over from one of our departed colleagues. However, because the bulk of that work – GIS apps migrations, account migrations – relate to production, we won’t be doing it till next week. 

We had a little reboot session in the week to pick up on some issues in our last retro. The outcome of this is that we’re going to start reserving time for learning each week – a Friday afternoon, though we will still respond to urgent issues, of course – and we will start to have a day in the office once a month. We’re also looking at how we can spread DevOps skills outside our team… more on this as we solidify ideas. It will take a little while for us to work out what our capacity is now, but this first week back has been a (relatively) gentle entry into the new year, with a new team shape. 

Cloud Engineering weeknotes, 23 December 2021

A short week, a quiet week, a week where we’re trying to tie up loose ends and be ready for January. This started with the account migrations; we’ve now moved Housing-Staging onto the Hub, and will move Housing-Prod after the code freeze. The GIS apps migration is in a similar position, with the staging environment having been tested and all found working. Although we had wanted to do production before Christmas, deploying on a Friday is never a good idea – and an even worse idea when that Friday is Christmas Eve. 

The websites migration is also going well, with all staging sites moved and working well. We’ve been able to identify some sites that can be shut down (with a backup held), and we will be in a good position to move the production sites after the freeze. That will leave just the API accounts and we will finally be done. 

There has been other work this week. We’ve restricted the ability to create certain networking resources, and have implemented AWS Federation for GitHub Actions. Both of these changes will improve our security. 

We’ve also done some work on costs. We’ve helped two teams identify high or unusual costs in their accounts, and they’re taking action to address that. We’re also working on allowing access to the staging version of Mosaic through Globalprotect; at the moment this is in AppStream and so before testing ramps up, we can avoid a lot of spend. It’s always good to save money. 

We say goodbye to most of our colleagues from Digi2al this week. We genuinely couldn’t have done this work without them, and we have all learned so much over the last year. Alliu, James, Tom, and Zoli, thank you so much for everything you’ve done. You will be missed. 

Cloud Engineering weeknotes, 17 December 2021

Another week of lots of under-the-hood progress, especially on the account migrations. Having been blocked for so long, the path is now clearing.

The work we needed on the application layer in the Housing accounts has been completed, and huge thanks to the MTFH Finance team for prioritising that. This means we are now able to roll both Staging and Production onto the Hub. The GIS apps are in a similar state: Staging apps have been migrated and the final testing is being done. We just need to move Production. 

Once these final migrations are done, we will be able to move the API accounts to the Hub later in January. We’ve identified some more business applications in the API accounts, and they can move later into their own account. On top of this, the Website migrations are going well, with the Staging version of Find Support Services deployed and working in the new Ansible container. We’ll move the remaining Staging versions shortly – performance improvements mean a deployment is now taking just 90 seconds. 

Beyond that, we’ve been plugging away on Globalprotect, documentation, and support. On Globalprotect we have hit a few bumps on having separate instances for internal and external applications, but we’re getting there with a few tricks to try yet. Progress on authentication groups is happening again, thanks to Mario for his work here. 

We are now getting ready for most of our Digi2al colleagues to roll off the project, so there’s a lot of close working on the migrations and a lot of documentation going on. We’ve done a lot over the last year and it’s time for a break, but please do be aware that from January we will have a smaller team. As a result, we will be going a bit more slowly, so please give us as much notice as possible if you need something from us. 

Cloud Engineering weeknotes, 10 December 2021

It passed me by that it was our first anniversary two weeks ago. It’s been quite the year, and although it feels like not a lot has happened over the last week, I’m not sure that any of us would have forecast the progress we have made in the last 54 weeks. I remember a workshop earlier this year, thinking that we had so much to do; it would be interesting to run that again to see how much we’ve moved on.

Although it’s good that we’ve been able to do so much for colleagues across the council, what I’m most proud of is the team we’ve built. Actually agile, focusing on value, flat, and dedicated to learning. We have a camaraderie that has sustained us, a culture that is healthy and welcoming. Long may this last.

Over this last week, the team has got some good stuff done, mostly to support other teams. We’ve built a VPN to connect to Servelec’s back end to enable a data migration; built some EC2s; set up networking for Social Care Finance; and are investigating ways to enable colleagues in HR to receive data from our payroll provider.

Some of the other work is being used as a catalyst, or maybe a test bed, for things we knew we’d need to do in future anyway. For example, we know that external services will need ingress access to our environment, so we have used requests from the Academy application manager and from the Data Platform team to work out how best to do this in a secure way, and how to automate it.

There’s been some progress on account migrations. The GIS apps are talking to each other but we need to set up a connection to the Addresses API as the final step. We’ve agreed a way forward on the Housing accounts with the Housing Finance team as well, and that should be unblocked early next week. Progress on the websites migrations has also come on strongly, with an AMI built in Packer and an ALB configured. We’ll migrate the first site in the staging environment shortly, to make sure it all works as expected.