Cloud Engineering weeknotes, 7 January 2022

With the code freeze extending into the first week after Christmas, it’s been a quiet week. Or it was going to be, until a routine WordPress update went wrong. The team swarmed on this issue this morning, and as at the time of writing, all affected websites have been restored.

Before that happened, the team was regrouping after the break. We started with refreshing our memories of what we had been doing before Christmas, especially where that work had been handed over from one of our departed colleagues. However, because the bulk of that work – GIS apps migrations, account migrations – relate to production, we won’t be doing it till next week. 

We had a little reboot session in the week to pick up on some issues in our last retro. The outcome of this is that we’re going to start reserving time for learning each week – a Friday afternoon, though we will still respond to urgent issues, of course – and we will start to have a day in the office once a month. We’re also looking at how we can spread DevOps skills outside our team… more on this as we solidify ideas. It will take a little while for us to work out what our capacity is now, but this first week back has been a (relatively) gentle entry into the new year, with a new team shape. 

Modern Tools for Housing – Programme Week Notes 7/01/22

It’s our first week back after everyone’s had a very well deserved time away. We’re still on firebreak where the teams are easing back into delivering at pace. We also have folks still on holiday or off with covid so it’s a little slower in general at the moment. Back to normal next week.

Finance – Silvia – HackIT workstream DM

  • Next Show and Tell: 18th January 2022
  • Celebrations:
    • Data Migration Plan – Our Data Migration plan has been shared with Rashmi / Mirela detailing what steps we plan to take, what entities would the migration affect, what APIs are to be called and the changes that need to be made. The Charges data migration coding work was completed, coding began for the Accounts API, with the other APIs to follow next week.
    • Good progress has been made on the Leasehold Estimate Calculations work, with follow up meetings arranged next week with members of the Finance and Leasehold Service Team to answer some remaining queries.
  • Challenges:
    • We experienced some technical challenges this week regarding providing access to the required application and the creation of accounts for the Pen testers. These issues have now been resolved and the Pen Testing will start on Monday 13th January.
    • Data Migration will take place after the PEN testing.

Manage My Home – Yvonne – HackIT workstream DM

  • Celebrations:
    • Data Platform work 
  • Challenges:
    • Very small team

Repairs – Sarah – HackIT workstream DM

  • Celebrations:
    • Deployed an urgent bug fix for the operative split issue.  This applies to jobs raised since the deployment.
    • Started communications with gas engineers and electricians in preparation to onboard them from next week.
    • Challenges:
    • Lack of a team presence in the depot for onboarding is going to be quite a struggle but we’re thinking best about how to support operatives remotely as well as working with supervisors who will be in the office
    • Lacking a clear goal this sprint, lots of things to do with bonus, but less understood criteria on some of the other things

Bit of a minimal update this week as we’re still on firebreak. There’s lots of follow-up discussions after we concluded the new agency contract to continue to assist us in the delivery of the programme. Our first priority is to stand up a new team to deliver Manage My Home now that the previous agency has rolled off. At the same time we’re investigating how we can start planning how to transform the Repairs workstream into a product team – including our related recruiting strategy. Finally, we’re thinking about the overall structure of the programme itself and whether it’s set up to best deliver what we need for Hackney Housing in 2022.

I’m going to leave it at that for now – back with a lot more detail next week.

Data Platform weeknotes 15 07.01.2021

For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.

We have had a slightly quieter period prior to Christmas with lots of team members taking time to rest and rejuvenate after a very busy year. We now start 2022 in an introspective mood as we continue to take stock of our achievements and reassess our ongoing project priorities and ways of working.

We are holding a team-wide project retro in which we hope to  not only celebrate our achievements but also discuss aspects of the project which we need to improve or make iterations on. In addition we are very much looking forward to presenting at the HackIT Strategy Show & Tell on the 13th of January.

We have continued to focus on Tascomi  Planning data, Manage My Home housing data and our move to a Production environment. 

Making housing data available for reporting

We’ve been collaborating with developers from the Manage My Home team to set up an event streaming process to put data from the platform APIs (e.g. tenure API, person API) into the data platform. We’ve set up a tool called Kafka, which is an open source software which provides a framework for storing, reading and analysing streaming data, to receive the platform API data in the Data Platform. We continue to work with the Manage My Home team to create a Lambda which sends the data to Kafka/the data platform

We have had some issues with the set up including finding an alternative way to set up the infrastructure as code. This is because we’re using an AWS-managed version of Kafka which doesn’t fully support terraform or cloudformation. We’ve now determined that we need to use an API for this and we hope this work will be completed soon.

Providing a read-only production environment for data and processes

We have moved closer to our goal of moving the data platform to a production environment. A key part of this work has been the automation of Redshift configuration which was previously done manually by the team. Redshift, a data warehousing product, can be used by visualisation tools such as Qlik as a proxy to connect to data held within the Data Platform.

With the configuration process automated, departments onboarded to the platform will have consistent, reusable access to the data platform from multiple environments without waiting for manual configurations of the service. This will also ensure that permissions and security remain consistent between setups.

We have also worked to migrate our Qlik EC2 instance from the Production API account to the Production Data Platform account. However, we’ve run into a few blockers:

  • Some of the existing connections used by Qlik in the Production API account are not functioning as expected in the Production Data Platform account and actions are being taken to test and resolve these issues.
  • The cloud engineering team has been configuring AppStream against the Production Data Platform version of Qlik to ensure continued service when we switch over to the new account. However, the account that holds AppStream has reached a soft-limit on the number of instances running and we are awaiting resolution by AWS.

In addition to the requirements of moving Qlik to the Production Data Platform account we have been working with the Cloud Engineering team to provide an alternative connection method that does not require AppStream. This is being provided in the form of a Global VPN solution which we have already successfully tried on our Staging account. We are now waiting for the implementation of this solution on the Production accounts which we hope to trial after the migration to the Production Data Platform account.

Making Tascomi data available to planning analysts so they can produce the reports they need

A daily snapshot of the Planning Data is now being produced in the refined zone. This will enable changes in Tascomi to be available to users within 24 hours. From a data storage perspective, we will only be processing changed records therefore reducing costs and resources.  Data quality checks have also been implemented to ensure that incoming data is of an acceptable standard before being moved through the data pipeline. Users including analysts can be reassured that data has been checked for issues such as duplicate records.

For more information about how the Tascomi Planning Data Ingestion process works, please have a look at our playbook documentation, which has been recently updated with information about the daily snapshot.

The data is available in Qlik and the majority of the Planning KPIs that existed prior to the cyberattack have been rebuilt. We’ll be accompanying Planning colleagues to produce the reports they need, e.g. regarding application response times. 

Up next

  • Take the opportunity of entering into a new year, project phase, and contract to reflect as a team and share this at the Strategy Show & Tell
  • Continue to trial Kanban as a way of planning our team’s work
  • Continue to work with the Modern Tools For Housing teams to ingest data into the platform, and understand more about their end reporting requirements so that we can plan the transformation that may be needed
  • Investigate how the Capita Insight tool, and the databases it creates, could help us ingest Council Tax data into the platform

Cloud Engineering weeknotes, 23 December 2021

A short week, a quiet week, a week where we’re trying to tie up loose ends and be ready for January. This started with the account migrations; we’ve now moved Housing-Staging onto the Hub, and will move Housing-Prod after the code freeze. The GIS apps migration is in a similar position, with the staging environment having been tested and all found working. Although we had wanted to do production before Christmas, deploying on a Friday is never a good idea – and an even worse idea when that Friday is Christmas Eve. 

The websites migration is also going well, with all staging sites moved and working well. We’ve been able to identify some sites that can be shut down (with a backup held), and we will be in a good position to move the production sites after the freeze. That will leave just the API accounts and we will finally be done. 

There has been other work this week. We’ve restricted the ability to create certain networking resources, and have implemented AWS Federation for GitHub Actions. Both of these changes will improve our security. 

We’ve also done some work on costs. We’ve helped two teams identify high or unusual costs in their accounts, and they’re taking action to address that. We’re also working on allowing access to the staging version of Mosaic through Globalprotect; at the moment this is in AppStream and so before testing ramps up, we can avoid a lot of spend. It’s always good to save money. 

We say goodbye to most of our colleagues from Digi2al this week. We genuinely couldn’t have done this work without them, and we have all learned so much over the last year. Alliu, James, Tom, and Zoli, thank you so much for everything you’ve done. You will be missed. 

Cloud Engineering weeknotes, 17 December 2021

Another week of lots of under-the-hood progress, especially on the account migrations. Having been blocked for so long, the path is now clearing.

The work we needed on the application layer in the Housing accounts has been completed, and huge thanks to the MTFH Finance team for prioritising that. This means we are now able to roll both Staging and Production onto the Hub. The GIS apps are in a similar state: Staging apps have been migrated and the final testing is being done. We just need to move Production. 

Once these final migrations are done, we will be able to move the API accounts to the Hub later in January. We’ve identified some more business applications in the API accounts, and they can move later into their own account. On top of this, the Website migrations are going well, with the Staging version of Find Support Services deployed and working in the new Ansible container. We’ll move the remaining Staging versions shortly – performance improvements mean a deployment is now taking just 90 seconds. 

Beyond that, we’ve been plugging away on Globalprotect, documentation, and support. On Globalprotect we have hit a few bumps on having separate instances for internal and external applications, but we’re getting there with a few tricks to try yet. Progress on authentication groups is happening again, thanks to Mario for his work here. 

We are now getting ready for most of our Digi2al colleagues to roll off the project, so there’s a lot of close working on the migrations and a lot of documentation going on. We’ve done a lot over the last year and it’s time for a break, but please do be aware that from January we will have a smaller team. As a result, we will be going a bit more slowly, so please give us as much notice as possible if you need something from us.