Cloud Engineering weeknotes, 21 January 2022

A quiet week; just as well, as we have been a few people down this week, making our already-small team even smaller. 

We’ve still got some stuff done, mostly on the support side of things – permissions, DNS changes, and restoring the connection between an EC2 and an S3 for the Document Migration team. We also had a demo of the new HaloITSM system; we will be using this for all support requests from 31 January so please note that requests via Slack will not be picked up from that date. 

Tomasz and Cintia continue work on the firewalls, and specifically on Globalprotect. The work to split Globalprotect into two for “internal” and “external” applications is nearing completion. Cintia has also been supporting Frank with networking on the production Ansible infrastructure for websites. We always learn more than we think we have. 

The other main piece of work this week has been preparations to replace the wildcard SSL certificate. It expires in a few weeks, and we want to replace it with AWS-issued certificates, which will renew automatically. Thanks to AWS Config, Matt was able to track down all usage of the wildcard in AWS in record time. We’re now planning how and when to do the replacement exercise.

However, AWS Certificate Manager doesn’t allow exports, so to deal with services outside AWS that uses our certificate, we’ll use a different method. We did a lunch and learn on this during the week. 

Although our plans for this sprint are to clear out the work in progress, much remains blocked. One of our suppliers is being slow to respond to a request, which in turn is delaying important work for Repairs Hub. We think this is now unblocked, but a second supplier is also being unresponsive on work for Social Care. This will be escalated. 

Data Platform Weeknotes: 17 24.01.2021

For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.

Help us improve how and where we communicate

Thanks to those of you who have taken the time to fill in our feedback form. There’s still time if you’d like to voice your opinions about how we can best improve the way that we communicate with everyone who is interested in the work we are doing. 

Update on how Planning data in the platform is being used

This week we had some brilliant feedback come through from our collaborators in planning. Previously on the data platform project we had worked hard to ingest planning data from the Tascomi API so that planning analysts can produce the reports they need. It has been a challenging process at times but we heard this week that our collaborators are extremely happy with the data dashboards that have been created using the data platform ingested data.

We hope that managers will be able to use the information to inform their planning and decisions. We hope that in turn residents will be able to benefit from the  insight being gained about the way they use the council’s service.

Whilst we’ve been able to recreate the vast majority of planning’s KPIs through the data we’ve already ingested, there were still some tables we couldn’t access via the Tascomi API. This was unblocked this week, so we’re using it as an opportunity to test out our process and documentation by onboarding two D&I analysts who are now able to add new tables to the pipeline. This not only means we have more data in the platform, but we’ve also increased the people who are able to get data in too.

Making housing data available for reporting

We are working with developers from the Manage My Home team to set up an event streaming process to put data from the platform APIs (e.g. tenure API, person API) into the data platform. We hope that this will allow the housing services to see what data can be used within dashboards to provide important information to service managers and product owners, for example how many tenures are created or amended within Manage My Home. 

For a more detailed breakdown of the process involved in this, please have a look at weeknote 16

Housing data challenges: This week we have been testing running a lambda function in the development scratch account that gets a tenure from the development tenure API then pushing a message to a kafka cluster in the data platform account. We’ve had to make some small changes as we’ve been doing this. We are nearly there but are still working out the correct networking settings for the Kafka cluster which is proving a bit tricky. We’ve also had some staff absence due to covid which has slowed us down a little.

Future proposal for ingesting historical data 

We presented at the HackIt Technical Architecture meetup and discussed our proposal for a way to stream historical data. Currently the event streaming will only supply us with data from new events that are streamed (e.g. a new person is created, or a tenure is updated) but won’t provide any historical data. Our proposal includes some changes being made to the platform APIs which will require some coordination with our colleagues in the dev team so that the work carefully considers all of the affected dependencies.

Next: Meet with the dev team to refine our proposal for streaming historical events, and once we’re all on the same page we’ll likely take this to the Technical Design Authority as it’s a fairly wide-reaching proposal.

Backfilling data from our production environment to pre-production

We have completed a lot of work which will enable our move to a production environment. For a more detailed breakdown of this work, please refer to weeknote 16.

Making Council Tax data available for reporting

We’ve been exploring the database created by Capita’s Insight tool as a means to get Revs & Bens data into the platform. Last  week our initial investigations suggested it had most of the key tables that were used in previous analytical outputs relating to Council tax, Housing Benefit and Business Rates.

This week we’ve reviewed our findings with our collaborator in Council Tax, Dave Ellen. We still think the Insight database gives us approximately 90% of what we need, but it’s frustrating that the other 10% available in the live Ingres database isn’t there. We’ve been working with Chris Lynham to get further access to see if these tables could be hiding elsewhere within Insight or whether we really do need access to the full database.

Next : Decide whether the Insight database is sufficient (or at least good enough for now) and run a tech spike on how to ingest this into the platform.

Developing a community of practice

We are keen to connect with more analysts across the council and get more key  people engaged with our work. We have come up with an idea for a superuser group of analysts who we hope will build the foundation of a thriving data platform community at Hackney. We will be planning some meetups and workshops in the near future. We’ve also been asking ourselves, ‘what are all the things we need to improve in order to onboard new users easily?’

Data Platform Weeknote 16: 17.01.2021

For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.

This week has been all about taking time to reflect upon phase 1 of the Data Platform work and to celebrate our achievements since March 2020. We held a team retro and workshop in which we asked questions like ‘What are we proud of?’ These gave us the opportunity to reflect upon how far we have come in terms of our personal development and how we have  been able to apply that to our work on the platform. 

We also asked ourselves a series of ‘How might we… ‘ questions: ‘How might we improve time to reliable insights’? and ‘ How might we onboard new users?’  to mention a couple. We are committed to continue our work on the product roadmap but for now we have come up with many ideas to guide our thinking.

This process has also led us to a discussion about the way we communicate our progress with Hackney staff who, for various reasons, may be interested  in the progress the Data Platform team is making. We decided to make use of a feedback form to gauge how we can best improve the way that we communicate with everyone who is interested in the work we are doing. We’d appreciate it if everyone could take a couple of minutes to fill in the Google Feedback form.

We have continued to focus on ingesting Manage My Home housing data,  moving to a Production environment and researching the usability of the Capita insight tool to ingest Council Tax data into the platform. 

Making housing data available for reporting

Reporting from the suite of Modern Tools for Housing applications is currently very limited or not available at all, so we’ve been collaborating with developers in the Manage My Home team to get this data into the platform. The end goal is that this data can be used within dashboards to provide important information to service managers and product owners, for example how many tenures are created or amended within Manage My Home. There are 4 broad steps to this work:

  1. Getting the data into the platform – In progress

We are working with the Manage My Home team to develop a new, reusable event streaming process to collect data from the platform APIs (e.g. tenure API, person API) and put it into the data platform. We’ve been working on several elements of this process:

  • Development of a Lambda which sends the message from the API to Kafka. Last week we ran end to end testing on this Lambda and have worked to fix any issues.
  • Setting up Kafka, the tool we’re using to receive the data in the platform. During our setup of Kafka we ran into issues with Kafka Connectors, an extension to the AWS implementation of Kafka that facilitates the deployment of connector modules. In our case we needed to use this to deploy an S3 writer module that would allow Kafka to stream data directly to S3.

We hope to have these elements in place this week so that we can test the whole process. However, this will only supply us with data from new events that are streamed (e.g. a new person is created, or a tenure is updated) but won’t provide any historical data. We’ve developed a proposal on how we’d do this via changes to the platform APIs which we’ll be discussing with the dev team this week.

  1. Transforming the data so it’s ready for analysis – Scoping

Once we have the data in the platform, we’ll need to write scripts to transform this data so that it is ready for analysis. For example, we’d need to bring together multiple event streaming files to develop a single daily snapshot. We are still scoping out how much transformation is required, as this will be led by the reporting needs. We started the week thinking this was to provide tenancy information to the housing finance team, but this need can actually be met by joining up the Manage My Home and Housing Finance tools directly rather than going through the data platform. We’re planning to catch up with the MTFH team next week to get more clarity.

  1. Connecting the data to a BI tool (e.g. Qlik) – Complete, using processes already in place
  1. Making dashboards – Not started

As with data transformation, we need more clarity on the reporting needs and priorities before we can scope the dashboards.

Providing a read-only production environment for data and processes

We have moved closer to our goal of moving the data platform to a production environment. A key part of this work has been the automation of Redshift configuration which was previously done manually by the team. Redshift, a data warehousing product, can be used by visualisation tools such as Qlik as a proxy to connect to data held within the Data Platform.

With the configuration process automated, departments onboarded to the platform will have consistent, reusable access to the data platform from multiple environments without waiting for manual configurations of the service. This will also ensure that permissions and security remain consistent between setups.

We have also worked to migrate our Qlik EC2 instance from the Production API account to the Production Data Platform account. However, we’ve run into a few blockers:

  • Some of the existing connections used by Qlik in the Production API account are not functioning as expected in the Production Data Platform account and actions are being taken to test and resolve these issues.
  • The cloud engineering team has been configuring AppStream against the Production Data Platform version of Qlik to ensure continued service when we switch over to the new account. The AppStream instance has been created and we now need to enable Google Single Sign on to test functionality

In addition to the requirements of moving Qlik to the Production Data Platform account, we have been working with the Cloud Engineering team to provide an alternative connection method that does not require AppStream. This is being provided in the form of a Global VPN solution which we have already successfully tried on our Staging account. We are now waiting for the implementation of this solution on the Production accounts which we hope to trial after the migration to the Production Data Platform account.

Making Council Tax data available for reporting

We’ve been trying to get data from Academy (our application for Council Tax, business rates and Housing Benefit) into the platform as these are crucial services that need reporting as well as strategic datasets that have a lot of potential for cross-service use. After looking at several ways to ingest data from Academy, (see this weeknote for more info) we have decided to investigate the possibilities offered by Academy’s insight tool. We don’t plan to use the Insight tool itself because it’s a separate business intelligence tool to create dashboards, and would silo off this data rather than help democratise it. However, setting up this tool creates a reporting database on top of the live database, and it’s this reporting database that we’re now trying to connect to the data platform. 

We weren’t sure if it would contain all the data we might need, but initial investigations suggest it has most of the key tables that were used in previous analytical outputs relating to Council tax, Housing Benefit and Business Rates. 

The next step is to determine how to best ingest the data into the platform so that we can fully test the completeness of the data to meet the requirements of colleagues in Revenues and Benefits. We will need a tech spike to determine this, but are prioritising the MTFH data first.

Up next

  • Continue to work with the Modern Tools For Housing teams to ingest data into the platform, and understand more about their end reporting requirements so that we can plan the transformation that may be needed
  • Start a troubleshooting section of our Playbook, starting with a partitioning issue that’s cropped up twice now for parking users when data tables get very large

Modern Tools for Housing – Programme Week Notes 14/01/22

Our first full week back following the post-holidays firebreak and all the teams are getting back into full-stream working. The latest updates from them below.

Finance – Silvia – HackIT workstream DM

  • Celebrations:
    • Great progress has been made in our Programme Strategy and Planning meetings, with the Amido handover arranged for next week. 
    • PEN testing started at the beginning of this week
  • Challenges:
    • Additional effort is needed to ensure that all solutions (APIs and frontend) are adequately documented.  This will aid in sharing understanding of architecture, implementation, and approaches to problems needed to be solved.

Manage My Home – Yvonne – HackIT workstream DM

  • Celebrations:
    • Becoming clearer what team will look like with Nudge
  • Challenges:
    • Misconceptions and misunderstanding around data in the Person, Property and tenure Microservices and how other Housing services can access this data. 

Repairs – Sarah – HackIT workstream DM

  • Celebrations:
    • Onboarding for Mobile Working has kicked off to a very good start, with a good proportion of gas engineers and electricians now mobile, with the carpenters next up.
    • We’re no longer reading from the outdated API’s which were a snapshot of the pre cyber attack data. All property and tenure information is reading from the core data microservices, with contact names and numbers our next priority.
    • Challenges:
    • We’ve started the conversation about how the repairs team is going to evolve into a product team. There is inevitably going to be a period of time where we’re not going to be able to deliver value to users as quickly as we would like whilst we onboard new team members.
    • Devices.  We have been very fortunate to have been given new phones for the DLO reactive plumbers and drainage, but there are still many more trades to onboard

With the move to a single agency assisting HackIT across the whole of the programme it definitely feels like we’re in a period of transition. This is likely to last a little while as new folks come into the team, both to re-start MMH and at the programme-level, and we continue to rethink and discuss our overall shape and governance practices. This is combined with the overall current uncertainty around the impact of the “HackIT 3.0” department reorganisation. As always though the whole team is taking this as an opportunity to improve and is suggesting lots of great ways we can move forward.

Two of the things we’re discussing at the programme level are the introduction of some Programme Principles and / or a Programme Charter (based on the concept of a Team Charter).

For our governance we have a lot of things in the mix but we know we definitely want to revisit our roles and responsibilities definitions, particularly our Product Owners, and reexamine the purpose and execution of our Steering Group sessions which currently mostly act as reviews of recent work rather than forums addressing our difficult outstanding issues.

The good news is that I finally completed my self-imposed task to write up a full “state of the nation” document for MTFH. The bad news is that it came out to a behemoth of 14 pages which is too much for anyone in the team to need to digest. Instead I’ve almost finished pulling out the salient points into a draft public programme backlog (previously I just kept track of things in my personal one) which I’ll be circulating very soon. It also has a lot of items in it but I hope the way I’ve set it up will mean that it’s easy for us to slice it a number of ways including for highest priority things to cover in our new daily programme stand-ups, but also to have a new separate regular session to look at items that could delivery huge value in the long term but are unlikely to be started now as their benefits aren’t immediate.

There are many such items but one that I’m definitely keen to spend some time on is looking into how we can best support the new Link Work team that HackIT has put together. This is a multidisciplinary team from across our council service areas that is proactively supporting our most vulnerable residents.

As ever, a huge amount to do – both in the short term, but also making an effort on how we can help our staff and deliver excellent support for our residents by introducing new functionality that we’ve never seen before.

Cloud Engineering weeknotes, 14 January 2022

This week demonstrated how the team has grown and matured; we abandoned our sprint. This is a rare event for an agile team, but it’s a sign that the team just knew that what was there wasn’t going to bring value, so we abandoned it for things that would actually bring value. 

We focused on two main things. First, the firewalls and Globalprotect. We’ve implemented new authentication groups in Globalprotect; this is part of splitting Globalprotect into internal and external for apps hosted in our own AWS or SaaS. The IP ranges have also been corrected, as the 172 range we had been using was an antipattern. 

The big change in the firewalls is the implementation of a VPN to secure access to the management console.  This increases the security around who can access the firewalls while also simplifying that security. The team got a crash course in how the firewall routing is configured on Wednesday, when we mobbed on an issue connected to the websites – the other main thing of the week.

The new Ansible infrastructure wasn’t communicating properly with the Hub so we had a big screen-sharing session with Cintia driving the work on the firewalls while Frank and Stuart debugged the Ansible and everyone else learned a lot. We’ve agreed we should do more sessions like this. 

There have been some other support-related tasks this week, such as supporting Public Health with a bulk file transfer, and setting up a way for Pensions to download data from S3. But a lot of our other planned work is still blocked. We had planned to roll the Housing-Production account onto the Hub last night, but found out in the late afternoon that a SaaS supplier hadn’t put the new IP addresses on their allow-list. This would have led to severe disruption elsewhere in the council, so we have postponed the change. 

In an effort to “clear the decks” of our work in progress and our blocked work, we have cleared out our backlog and in our next sprint are focusing only on those tasks in hand, plus a couple of new essential tasks. The more we can clear the slate, the more we will be able to focus on bringing value to the platform and its users.