Cloud Engineering weeknotes, 18 February 2022

Not much progress this week, as several of the team have been taking some much-deserved rest. The rest of the team has also slowed down a little, which honestly is a good thing, as going as quickly as we did through 2021 is just not sustainable and leads to mistakes. 

Erdem and Ninad have made good progress on moving the first of three covid grant application services. This has been quite a learning curve for both but Staging is almost complete. We don’t believe that Production is in use, so that can be moved fairly quickly thereafter. After finishing this application, they will tackle the other two separately, though working together. 

There is still a short list of applications to be moved; GIS apps in Production, and Repairs Hub in Staging. However, the latter is probably better done by the Repairs Hub team and we’ll be talking to them separately about this. 

Matt and Stuart have been learning more about the Palo Alto firewalls from Tomasz. This week, how to set up a VPN; one will be needed soon for a third-party system that will connect to Repairs Hub. The prep work has been done and we should be able to actually build it next week. Stuart will also update the existing VPN to Servelec as the encryption domains need to be extended. 

Tomasz has also done a spike into restoring access to an app called Plus5, used by Planning. It’s a very old application that requires an emulator as it only supports telnet, and we think we have a way forward using HostAccess on an EC2. We’re trying to get a trial licence to test it. 

Stuart is finishing up his work on Backstage, completing documentation for the Playbook. We don’t want to be gatekeepers for this, and any developer should feel free to add a plugin that would be useful. He’s also writing a paper for the TDA on how we could offer this out to other councils as a packaged service.

Something new this week was Ninad giving Chris L some 1:1 coaching on Terraform, starting from first principles. Chris is keen to be able to maintain the Academy account himself, and we’re more than happy to help him! This is certainly something we’d welcome more of, and if anyone would like to have access to Devscratch to teach themselves using Hashicorp’s documentation, just let us know. 

Cloud Engineering weeknotes, 11 February 2022

This week has again been all about certificates. With most AWS services having had their new certificates applied earlier in the week, we had turned our attention to the wildcard. Unfortunately, our supplier still has not provided the new certificate, so we developed a Plan B of using Let’s Encrypt to generate a temporary certificate. This has worked quite well, and has been passed to the relevant teams to use.

However, it’s temporary and we will need to keep working on this. In line with our automation policy, it is possible to use Let’s Encrypt to generate certificates automatically; but they are only valid for three months at a time and would still have to be applied manually. This is not an acceptable overhead, so we will explore other options. 

Although this has been a necessary distraction, some good progress has been made elsewhere. We almost have the first iteration of Backstage, which should be in production next week. It’s deliberately basic, with only the GitHub plugin so far, but any dev is free to add more. We gave a short demo of some of the features at our show & tell this week. 

We’ve restarted some work on account migrations. The API accounts have some applications in them that should be in better homes. We’ve started the process of migrating them, starting with the Development environment. Although we’ve migrated applications before, this is the first time we’ll have moved a serverless application, so it’s also a good learning opportunity. 

On Globalprotect, we have agreed with the relevant governance groups that we will use a different authentication method for “internal” applications. This is because it can’t process granular authentication data from our normal SSO provider. We have raised this as a feature request with Palo Alto Networks, so until then we will be using an alternative service that meets the same standards.

It was good to engage so productively with our governance, with the action being agreed within a day of asking. Governance doesn’t have to be a blocker to doing the right things, but rather it’s helped reassure that we’re doing the right thing in the right way!

Cloud Engineering weeknotes, 4 February 2022

A quieter week this week, as we’ve had some more sick absence. That will improve next week when one of our number returns from leave. 

The focus this week has been on renewing our SSL certificates. For anything hosted in AWS we are now using certificates signed by Amazon and managed through Certificates Manager. These are free, and will renew automatically. Certificates for most services have now been created and the details passed to developers for integration. Meanwhile, we are buying a new wildcard for externally-hosted services, and will look to improve how that is managed in the coming year. 

Our work on Globalprotect has moved on well this week. With the completion of work to split Globalprotect into “internal” and “external”, we’ve been doing some tidying up. Qlik is now being tested by the Data & Insight team. We are almost ready with Webreg, which will allow Customer Service Agents to check basic details on the electoral register, taking a lot of pressure off the Elections team in the run-up to this year’s local elections. 

Unfortunately, due to absence and support work, our other work has been slow. That said, we do now have a version of Backstage in staging, which is being populated with the GitHub plugin showing all our repos. We’ll get this into production soon and would encourage everyone to look at the other plugins, and even add them if you wish. I’m particularly interested in the Costs plugin, which will show us how much each account costs in real time. Our most costly account might not be what you think!

Cloud Engineering weeknotes, 28 January 2022

“The apron strings have been cut, and we had to sink or swim” was an observation at our retro this week. I think it’s fair to say that we’re swimming. The team is showing a skill and maturity that we probably didn’t think we had, or at least didn’t want to admit. This is reflected in some of the choices we’ve made recently – lots of pairing and knowledge-sharing, and the decision this week to abandon sprints in favour of Kanban. We weren’t really doing Scrum anyway.

The week started well, with the migration of the final Housing account to the Hub. This went smoothly, with the exception of losing a connection to Qlik. This was fixed quickly the next morning, and users experienced no disruption. We can now plan the final set of migrations – the API accounts. A lot of the preparation work has been done already, and again we will start with Dev to make sure it goes smoothly. 

Work on Globalprotect has also come on well this week. We now have authentication groups rolled out, and again users should have noticed no difference. This is part of splitting Globalprotect into “internal apps” (ie. hosted in our own AWS) and “external” (ie. SaaS). We completed the separation this week, with internal and external being routed through different firewalls. We can now complete the work to move Qlik onto Globalprotect, taking it out of AppStream and saving some money. 

We’ve had a lot of support requests this week, some of them complex. It is of course good to help other teams, but the trade-off has been that work to complete the website migration has been pretty much blocked all week. Fortunately, this is not blocking anything in turn, but it does mean that new work is having to wait. 

One thing that we have rebooted, however, is Backstage. This has been bouncing around for some time, but we’re finishing the job as part of clearing our decks. It’s been moved to an ECS container rather than running on an EC2, and it’s been secured by Google SSO. We’ll add a couple of basic features to it, and should be able to roll it out very soon. We’d be grateful for feedback on what plugins you might find useful – see the catalogue for more information. 

Cloud Engineering weeknotes, 21 January 2022

A quiet week; just as well, as we have been a few people down this week, making our already-small team even smaller. 

We’ve still got some stuff done, mostly on the support side of things – permissions, DNS changes, and restoring the connection between an EC2 and an S3 for the Document Migration team. We also had a demo of the new HaloITSM system; we will be using this for all support requests from 31 January so please note that requests via Slack will not be picked up from that date. 

Tomasz and Cintia continue work on the firewalls, and specifically on Globalprotect. The work to split Globalprotect into two for “internal” and “external” applications is nearing completion. Cintia has also been supporting Frank with networking on the production Ansible infrastructure for websites. We always learn more than we think we have. 

The other main piece of work this week has been preparations to replace the wildcard SSL certificate. It expires in a few weeks, and we want to replace it with AWS-issued certificates, which will renew automatically. Thanks to AWS Config, Matt was able to track down all usage of the wildcard in AWS in record time. We’re now planning how and when to do the replacement exercise.

However, AWS Certificate Manager doesn’t allow exports, so to deal with services outside AWS that uses our certificate, we’ll use a different method. We did a lunch and learn on this during the week. 

Although our plans for this sprint are to clear out the work in progress, much remains blocked. One of our suppliers is being slow to respond to a request, which in turn is delaying important work for Repairs Hub. We think this is now unblocked, but a second supplier is also being unresponsive on work for Social Care. This will be escalated.