Cloud Engineering weeknotes, 11 November 2021

Some weeks in this project definitely have themes to them, and for me this week it feels that that theme has been documentation. As I said last week, we know there are gaps in our documentation, and that it mostly exists – but it’s in our heads and practices, or not properly codified in a single, easily-available document. So this week, as people have spare time, we’ve been writing documentation for the Playbook. I’ve even done some of it myself. There is a lot more to do yet, but you can check our progress. Thank you, Stuart, for the inspiration and motivation. 

We are rapidly (yet it feels oddly slowly, at times) reaching v1.0 of our platform. Following the rebuild of the firewalls, we have been able to deploy Panorama to manage them. This will hopefully be the last major piece of work on the firewalls for a little while. There is one ongoing piece of work though, to set up authentication groups on Globalprotect to give more granular permissions. This is proving a little more difficult than expected, but we have a solution to test. 

Account migrations rumble on; we had of course expected to have finished this work a couple of weeks ago. There’s not been much progress since last week, though we have started looking at the Websites account, and working out if it would be better to host the WordPress instances in containers instead of on EC2s. 

We have the necessary changes for e5 lined up with Advanced, and should be able to move that account and attach its new VPN next week. However, we are stuck on the Housing accounts due to competing priorities in MTFH. We anticipate that that will be resolved next week, and this will in turn unblock the move of the API accounts. 

And then… we start iterating. 

For now, though, there’s also been a lot of support work this week. We’ve supported the Security Assurance team with their work, and created an EC2 for the new Canon Uniflow scanning service. We continue to iterate our GitHub policies, and are providing advice and guidance to several teams. We’ve also just enabled something called Compute Optimiser, which scans our entire estate to identify any Compute resources that are over- or under-provisioned. 

Dare I say things are starting to settle and mature?

Data Platform Weeknotes: 11.11.2021

Weeknotes: 08.11.2021

For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.

Improving the Data Platform playbook

We have spent some time this week rethinking the structure of our Data Platform playbook. Despite being happy with a lot of the content that has already been added to the playbook, we felt that we needed to spend some time looking at particular user journeys and how to make the experience of using the playbook as comprehensive and accessible as possible. 

We have thought about the particular needs of data scientists, analysts, engineers and managers and mapped the common and unique parts of the playbook that they may need to access. We are in the process of restructuring the playbook so that it uses a much clearer navigation menu for these users.


Collaborating with Manage My Home

We’ve met with the Modern Tools for Housing team to agree on a reusable process to stream data from their application. We hope that our data platform data engineers will be able to help the Manage My Home team in this process.

We have put together a document which explores the benefits of using Kafka over an S3 bucket for the data streaming process. Kafka is an open source software which provides a framework for storing, reading and analysing streaming data. We also need to consider how Kafka might work or if it has any limitations working with the current .net based API architecture.

Simplifying creating Glue jobs in code

A Glue ‘job’ refers to the business logic that performs the extract, transform, and load (ETL) work in AWS (Amazon Web Services) Glue . When you start a job, AWS Glue runs a script that extracts data from sources, transforms the data, and loads it into targets.

This week we have worked to ensure that the process for creating Glue jobs is clearly documented so analysts can easily create them. We are also working to make some Terraform templates (a tool to manage the entire lifecycle of infrastructure using infrastructure as code) that can be reused by copying & pasting, requiring no prior knowledge of Terraform. We want to make sure code is organised to make it easy for an analyst working for a specific department to know where to put code relating to them.

Up next we will be looking at testing the  process with analysts to make sure our work meets their needs and in order to decide if further refinement of the processes is required.

Creating a refined (Tascomi) planning data set with a daily snapshot

Now that Tascomi data is getting into the platform every day, we are in a position to refine our workflow. This means that only data increments will be processed every day (through parsing and refinement). After this process, daily data increments will be incorporated with the previous version of the dataset to create a daily full snapshot. We’re still testing this approach and we’re hoping it will make Tascomi data (both current and historic) easy to access for planning analysts.

Using Redshift with Tascomi data

Redshift is a cloud based data warehouse product designed for large scale data set storage and analysis. Exposing the Tascomi data in the Redshift cluster means that we now have daily loads into Qlik ( business analytics platform) with only the latest current version of the data stored into tables for analysts to use. We have started to create a data model that pre-builds the associations between the tables for easier interrogation.  Analysts can also use Redshift connectors into Google Data Studio. 

Next Show & Tell – Friday 12th November

Our next Show & Tell is on the 12th of November at 12-12.30pm. Come along to find out more about what we are up to and invite others that may be interested (the calendar invite is open). Email ana.bebic@hackney.gov.uk if you need any help. For anyone that can’t make it, don’t worry we will record the session and post on Currents/Slack after. 

HackIT 3.0 weeknotes #7

Week ending 12/11/21

It’s been a while since I’ve published any HackIT 3.0 weeknotes. There’s a few reasons for this, including the fact that it’s the type of change management work that doesn’t always fit a standard Agile sprint format.  

The latest news is that after taking some time over the summer to listen to your feedback and reflect, our Divisional Management Team (DMT) has arrived at a point where they can give a further update at the Strategy Show & Tell on 18 November, with more detail on what the future shape of the service.

This will focus on the next iteration of proposals for HackIT, including more detail on what teams will be part of the new structure, their responsibilities, and the types of roles needed in those teams; as well as our estimated timeline for the next stages of the work leading to completion of the restructure.

The show and tell will be a chance for you to ask any questions you may have about proposals, so please try and attend if you can.

What we did do vs what we didn’t do

Looking back over past 3.0 weeknotes – which I’ve republished on the revived HackIT blog so they’re all in one digital place for reference – it was reassuring to see the progress that has been made over the last few months.

It’s important to note that contributions made in the 3.0 workshops, alongside team and individual feedback, has all proved invaluable in helping reshape the latest proposals. This shows the value of our principle of working in the open.

Added to this, the first pilot product team – for Document Management – is up and running (you can read more about what they are working on in Lewis Sternberg’s weeknotes).

Plus we are now starting to work out how we go about creating a Managing Repairs product team as part of the Modern Tools for Housing programme, but we’ve delayed this as the team has been heads down delivering value to users and we didn’t want to distract them at a crucial stage.

Of the 3 proposed anchor roles we said we’d establish, we haven’t yet created any, but are making progress on a Product Management Lead role, and instead of a Standards and Assurance role we’re testing out whether a part-time project led by Cate gives us enough capacity to do the thinking about how a standards and assurance service might work. 

The Community of Practice coach has been less of a priority so far, but it’s still very much something we want to explore in the future. 

Recap for any new team members:

I’m the lead organisational change manager working to support DMT to deliver HackIT 3.0, the vision for this was first laid out in the ‘Towards our future shape’ document in March.

The development of HackIT 3.0 is an ongoing conversation, which everyone is invited to participate in. You can contact me directly, join the #hackit-3-point-0 Slack channel, and watch out for regular updates via Google Currents and show and tells.

Look forward to seeing you all at the Show and Tell next week.

Modern Tools for Housing – Programme week notes 5/11/21

Lots of excellent progress this week, but there’s some uncertainty in different areas which teams are grappling with. Here’s the updates from our workstreams.

Finance – Jay – HackIT workstream DM

  • Next Show and Tell: 9th November 2021
  • Celebrations:
    • PEN test for MAA has been completed and passed
    • 5000+ letters sent to residents
    • Resolved the Housing Search API blocker so work can continue
  • Challenges:
    • MAA Bugs and issues
    • Hackney dev resource 

Manage My Home – Yvonne – HackIT workstream DM

  • Celebrations:
    • The team smashing through tickets and getting loads of functionality and bug fixes done including being able to create new tenures
    • Starting three exciting things in this sprint: Showing Repairs Hub detail; Starting to build first process (sole to joint); and complete the patches API
    • Hugo’s walkthrough of Discretionary Officer process in the Show and Tell  – really well received with loads of audience engagement and questions  :tada:
    • Really supporting each other: we’ve been  tag teaming this week in getting a lot of things over lots of lines in short timeframes – has been intense!
  • Concerns:
    • Uncertainty about how we’ll deliver next phase as Amido contract ending
    • BA resource

Repairs – Sarah – HackIT workstream DM

  • Celebrations
    • Mobile Working launched with plumbers on November 1st
    • Addresses visible to the operative job list view – it’s the one feature every single operative has requested
    • Out of Hours repairs now being raised on Repairs Hub
  • Challenges
    • Bugs from the Mobile rollout
    • Dev capacity
    • Sync with DRS, we’ve increased this to every 15 minutes but we’re still not picking up certain changes from within DRS meaning some jobs are missing from the operative job list (though they still have the printed jobs so these are still going ahead)

Challenges

The Delivery Managers aren’t reviewing the new programme documentation as quickly as I’d like. We all agreed what would have value, but now the reality of all the daily delivery work they need to do, while working under high pressure with limited staff in key roles, means they’re not being able to prioritise time to actually do it. We’ll continue to discuss and work out how we prioritise .

We have a dependence on the Document Storage team especially as a blocker for the delivery of Housing processes in Manage My Home. To manage that I’m starting to consider a potential alternative interim solution.

We’re starting to think about how we transition from blended teams working in partnership with agencies to internal product teams. This will take time and careful thought and I’m starting to plan how we manage those handovers with the current agencies. 

Looking wider

This week I had the first of what will now be regular catch-ups with the head of the Housing Property and Asset Management (PAM) group. It went really well and we both learned a lot about each other’s team’s current work and future plans. There’s still quite a lot of uncertainty in this area around the purchase of a new PAM system and how that will fit in with the Asset microservice that we’ve built so ongoing conversations will be really useful.

Team and programme health 

We had our usual 3-weekly MTFH team leads retrospective. Three strong themes emerged – a desire for less change in the people allocated to the programme, having a better narrative to explain the great work we’ve done so far and making sure new people who join can get up to speed quicker. I’ll be speaking to people next week about the narrative – especially with the significant Housing Management Leaders meeting taking place in December. I’m also going to be working with the Delivery Managers to ensure they all have good processes for onboarding and offboarding new team members.

One of the key items for next week will be planning for our workshop to review the programme governance on the 17th. All the people taking part will be there in person, I’m very much looking forward to it.

Looking ahead

It’s good that we now have some clarity about how we’re moving forward with Managed Arrears. The existing system is now being used by the whole Income Services team and 1,000s of letters have been sent to residents. We’ve also approved the work for the new feature development on that system and for continued support. At the same time, the Income Services team are organising the pilot for RentSense and reporting back by the end of the month how it might fit in alongside our current product to deliver extra value.

Cloud Engineering weeknotes, 5 November 2021

I was off last week; as I got back into things this week, I had a chance to reflect on how far we’ve come over the last 10 months. 

The catalyst for this was a chat with Stuart, our new (permanent!) Senior Engineer. Considering the circumstances of our formation, as a team we have built a stable, secure platform; we’ve coached each other and upskilled a number of Hackney colleagues who had no previous experience in AWS; we’ve made a welcoming, productive, and expert team which often puts the needs of others ahead of our own. 

And I’m proud of that. In my absence, the team kept going and did all the right things in the right way and welcomed a new member as if he’d always been here. But we just don’t give ourselves enough credit for what we’ve done and in the circumstances we’ve been working in. As I’ve said before, nobody would ever choose to do a cloud migration in our circumstances, and we have much to be proud of. 

Stuart’s arrival has given us a useful outsider’s view on what we’ve done and what’s missing. He’s given us a brain dump of documentation he’d expect as a new starter, which we will work through over the next couple of sprints. Almost all of it exists already, we just need to publish it in the Playbook so that it’s in one place. He’s also started work on formalising our change and release processes so that we can avoid repeating some of the mistakes we’ve made in the last couple of months. 

The account migrations proceed, though slowly. The work needed to move e5 and the Housing accounts is lined up, and we’ve started decommissioning unused resources (with the data backed up). There is a definite chain of events – Manage Arrears needs to be updated so that we can move Housing, which will enable us to move APIs, which will allow us to clean up those accounts and move things to more appropriate homes. 

We’ve made some additional security improvements this sprint. We have a module to automate much of the Windows Server patching, which we spoke about in our lunch & learn. We’ve also made some changes to the GitHub repo to restrict who can approve PRs and enabled Branch Protection. 

The firewalls have been completely overhauled in the last two weeks. We’ve adopted a new licensing model that saves a lot of money, and the revised Terraform allows for faster redeployments. Importantly, we’re now able to use Panorama to manage the full suite of firewalls, and this means we only need to make a configuration change in one firewall – Panorama will manage the deployment of that change to all other devices. 

Thank you for your patience while we’ve rebuilt the firewalls as we know there have been a lot of outages and cancellations – but that does neatly illustrate why we need to tighten up our own change and release processes!