Cloud Engineering weeknotes, 25 March 2022

We did something new this week: We got together in the HSC. Well, most of us, as a few team members have been sick this week. But for the first time, the majority of the team were in the same place, at the same time, and it was magic. We did our show & tell with an actual audience and proved to our colleagues that we do, in fact, have legs. 

We’ve done some actual work as well. The StagingAPIs account was migrated to the Hub on Thursday night with no significant issues, and it’s working well. This just leaves the Production account and a legacy account that is mostly unused, but needs to be cleaned up. This has been an absolute labour of love, and although the migrations have taken significantly longer than planned, better to do it slowly and safely than to break things for everyone. 

Our work supporting the Mosaic restoration project is drawing to a close. This week we built the reporting server, and due to the limits of RDP connections, we have proposed to use AppStream for the data analysts to do their work. There is a plan B if it doesn’t work, but AppStream is (for once) likely to be more cost-effective. We’re working with the AppStream team on this now. 

We’ve also provided support this week to the Document Migration team, helping them set up additional EC2s to process the vast amount of recovered eDocs into Google Drive. We’ve made sure that these have been set up in a cost-effective way, and the team will ensure that the instances are shut down when not in use. 

We’ve been able to get some platform work done this week, which is good. Some of this has been simply cleaning up code – I say simply, it’s often harder to make your code clean – but we’ve also improved how some of our components work. One example of this is a new CircleCI module which automates the IAM setup for CircleCI for new accounts. This has been done manually in the past, and so this is in line with our policy of automating as much as we can. 

Centralised logging has not been deployed due to sickness absence, but we’ve started some work to set up Athena on the Cloudtrail logs. This will make it much easier to query the logs, especially if we need to audit actions committed by individuals. As said before, a secure platform is a good platform. 

Cloud Engineering weeknotes, 18 March 2022

Our good friend WordPress paid another visit this week, with more troubles on the Intranet. We’ve applied a manual fix for now, but we have been able to make progress on the long-term fix of better infrastructure. We’ve also had a meeting with a WordPress specialist who is doing a discovery into the estate, which was useful. 

Steady progress has been a theme this week, despite a few curveballs. A significant problem with Webreg in Globalprotect was reported to us. We thought the solution of using the desktop client wasn’t possible because our users have Chromebooks… a little digging, and Palo Alto has actually released a Chrome OS desktop client. We’ve tested it, and it works. We’re working with our Google and Devices teams to get this rolled out, which should be early next week. 

We have also migrated the DevelopmentAPIs account onto the Hub. This went without a hitch, and leaves just the Staging and Production accounts to go, which should follow soon. Both accounts will need some clean-up as they have applications that should be elsewhere, but that can be done later. The main applications are the GIS applications, and we have an agreed plan for that in April. 

Our support for the Mosaic restoration will draw to a close soon, but the essential infrastructure for their go-live is in a good place. We need to help with the reporting server but that can wait a few days, and won’t take long. 

On top of all this, there’s been some platform iteration as well. With thanks to the Social Care team, the backup service now includes DocumentDB instances. It’s fantastic to see a product team iterating one of our modules (reviewed by us) – this is the sort of thing we always envisaged from the start of this work way back when. 

We’ve also been tidying up the code for the HSCN connection, and have been working on a centralised logging service. This last one collates all the different logs in a single place, making it easier for the Security team to review and inspect. This links nicely with some work on permissions we have planned for next month, which will start with a new permission set specifically for that team. A secure platform is a good platform. 

Modern Tools for Housing – Programme week notes 18/3/22

The Modern Tools for Housing programme is the development of a suite of software products developed in collaboration between HackIT and Hackney Council’s Housing Department to support our staff and our tenant and leasehold residents. It currently covers housing repairs, home management, housing finance and support for our arrears collection team.

Highlights from our workstreams this week.

Managed Arrears – Silvia – workstream DM

  • Celebrations:
    • Completed screen for Worktray, Notification and Search page.
    • Integrated the screen into the Finance application.
    • Positive feedback from superusers on the User Interface (UI) for rent statements – post, email and print.
    • Any specified period rent statements pdf production works available, provided data is in the database.
    • Sign off on Alternative Dispute Resolution (ADR) for Send File by Email.
    • Gov Notify Application Programming Interface (API) setup and tested for Send File by Email.  Files now appear on encrypted links for download.
  • Challenges:
    • Testing in development with a small dataset
    • Finding an alternative to Tenancy API for customer email addresses as it is unmaintained
    • Writing Unit Tests for edge cases
    • Getting ready for User Acceptance Testing (UAT) sign off for Send File by Email
    • Figure out how to integrate the property page with Finance as they are using two different APIs.

Finance – Kate – workstream DM

  • Celebrations:
    • New QA Emilia joined the team!
    • The release of the LH estimates MVP and Data Migration into Staging so the users can begin testing the functionality.
    • We made good progress in our transition over to Jira. Our backlog still needs a lot of refinement before we are in an organised agile flow, but the progress feels great.
  • Challenges:
    • Another issue was raised in regards to an incorrect PRN, however this was picked up quickly and a fix pushed to production.
    • The Finance team’s velocity means the SoWs can only run for 6 weeks in order to keep the SoW costings around the £250k mark, which means we have to write SoW’s more frequently.

Manage My Home – Yvonne – workstream DM

  • Celebrate:
    • Welcome to our new QA- Hannah Maher – known as H!
    • Team bonding session in a ‘User Manual of Me’ workshop – which was fun and helped us get to know each other better. Team feels like it’s in a really good place now and they have all the tools to start picking up delivery momentum once they have more capacity.
  • Concerns:
    • New FE dev dropped out at the last minute
    • Still not delivered anything (mainly due to capacity issues and learning agile)

Repairs – Sarah – HackIT workstream DM

  • Celebrations:
    • Legal disrepair alerts are now being displayed
    • The team coped really well this week while we were a developer down
  • Challenges
    • Some performance issues related to the cautionary contacts sheets API
    • The Purdy onboarding features won’t deliver any value until they’re all ready, so we’re going to have a period where we don’t release much

Another busy week in MTFH. Our two new QA engineers started this week and I’m looking forward to chatting to them on Monday. Partly I’m keen to hear about their conversations with our TA about how they’ll be contributing to the programme but I’m also going to ask them to write up the status of each workstream’s tech and data documentation as part of their onboarding. I’m keen to know how easy it is for new starters to get up to speed with our technology.

The Repairs online team has hit the ground running. I’m looking forward to hearing from them next week about standing up their existing open source product ASAP so we can show it to people and get ready for the integrations we need to do.

The recruiting for the new Repairs product team Product Manager has closed and we’ll be interviewing this week. We’re looking for candidates for the other roles to start being sifted and pointed to us for interviews in the near future. At the same time we’re finalising the makeup of the MMH product team next week and will then start looking for a Product Owner for that team very soon.

We’re still having some issues finalising some people’s roles in the programme – more discussions this week. We’re also finding the same thing as every other organisation at the moment – recruiting is hard. We’ve had two potential new starters drop out within two days of their starting date this week.

Our Finance workstream has been laying our plans for at least four releases over the next three sprints. This is great as we’ve been blocked on releases for that team for some time and really want to get into the cadence of doing at least one release every week. We’re also going to be revisiting the Finance SoWs this week to get our next round of work signed off.

In Managed Arrears we’ve paused the release of our next SoW until two further pieces of work have been completed. Firstly we need to reprioritise the roadmap following the introduction of new requirements to support our colleagues who look after arrears for our leasehold properties. Secondly, we need to do further work to complete mapping out the relationship between Managed Arrears and our new RentSense – our new 3rd party AI product which recommends to our Arrears Officers which residents to communicate to have how.

Finally – it’s going to be a sad week for the programme next week as our Delivery Manager for Manage My Home, Yvonne, is leaving Hackney to take up a Senior Delivery Manager position at BBC News. Yvonne has done a fantastic job working with the MMH team – first with our previous agency partner Amido and now Nudge. In both cases she was an exemplary servant leader – coaching the team in all the best practices of self organisation. She’s going to do an amazing job at the BBC. We should know on Monday who her replacement will be and I’ll introduce them next week.

Cloud Engineering weeknotes, 11 March 2022

It was Groundhog week for wordpress issues on the website, but after a lot of cross-team cooperation it has now been resolved.  WordPress as a whole will hopefully become less of a strain on the team as we have a meeting with a specialist in WordPress to go over the current state and work out a way forward with it.

Some work has been done to start centralising Cloudwatch logs from every account, aggregating them into a single logging account. Control Tower does some of that for us but there are areas which aren’t covered by this. Centralising the logs in this way will provide a base for us to eventually provide some kind of centralised monitoring and alerting across the whole platform so it is a useful building block for the future.

Some housekeeping has been done on the Palo Alto firewalls, removing some configuration which was no longer needed or was left over from testing. More work has been done to make the access to the management dashboard of the firewalls far more secure, implementing a VPN which further tightens the security around a vital part of the AWS platform. Finally in firewall-land, more work has been done to get Panorama up and running which will give us a tool that can manage our three firewall environments in one place whilst also giving us faster disaster recovery options.

Work is still ongoing to migrate the GIS systems to their own, dedicated account. This work is important to ensure some order to how our accounts are organised, whilst also making it possible for us to finish off the work to migrate our “legacy” accounts to our final network architecture.

We’re still closely collaborating with the security team, having had a very productive meeting around potential vulnerabilities in our setup and how we can provide their team with more direct access to AWS so they will be able to see for themselves where we have issues.

Beyond that there has been the usual support and guidance that we provide around AWS access requests, 1Password requests (which is another orphaned service we’ve taken on) and cross account communication for some of the ongoing data recovery activities.

Cloud Engineering weeknotes, 4 March 2022

Our WordPress woes have continued into this week, with the pipeline between WordPress and Netlify falling over. This has meant that content on the main website couldn’t be updated. After investigation, the issue proved to be a plugin error, but one which was having knock-on effects and wasn’t easily fixable. We now believe the error is on the front end, and we can’t do anything there; the Dev team will be approached for help. 

The WordPress instances have proven to be a significant drain in recent months. We’ll be taking this up with DMT again as it’s outside our remit. The new infrastructure will help, but providing this level of life support is actually blocking us from moving the instances over to it. 

In happier news, we’ve worked with the Data Recovery team to remove five EC2 instances and reduce their EBS volumes by 40TB. Resizing the volumes will save over $4500 per month, and it’s already being reflected in our cost forecast for March. On the costs front, we’ve worked with the Data Platform team to move the Budgets module into the Infrastructure repo, so it’s available for all teams to use. We’d encourage it. 

The first of the three business grants applications has been moved out of the APIs accounts, prior to those accounts being migrated to the Hub. It was a good learning experience, and the steps have been documented. We’ve agreed to pause the other two applications for now. They may be decommissioned soon, and they’re not blocking the account migration. 

Our networking support for other projects is going well. The MESH client for Mosaic has been configured and can connect to Servelec. As the HSCN is only in Production, we need to repeat the exercise there. We’ve also finally cleared up some confusion over the needs for a product for Repairs, setting a red line for the supplier as the original design on their side would have presented a security risk for us. 

We’ve also finally started the migration from standalone firewall configurations to using Panorama to manage them all. We’re starting this in the Dev environment and there may be short-term outages as the Hubs rebuild. We’ll give notice of this, and Production will be of course out of hours.