For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.
This week has been all about taking time to reflect upon phase 1 of the Data Platform work and to celebrate our achievements since March 2020. We held a team retro and workshop in which we asked questions like ‘What are we proud of?’ These gave us the opportunity to reflect upon how far we have come in terms of our personal development and how we have been able to apply that to our work on the platform.
We also asked ourselves a series of ‘How might we… ‘ questions: ‘How might we improve time to reliable insights’? and ‘ How might we onboard new users?’ to mention a couple. We are committed to continue our work on the product roadmap but for now we have come up with many ideas to guide our thinking.
This process has also led us to a discussion about the way we communicate our progress with Hackney staff who, for various reasons, may be interested in the progress the Data Platform team is making. We decided to make use of a feedback form to gauge how we can best improve the way that we communicate with everyone who is interested in the work we are doing. We’d appreciate it if everyone could take a couple of minutes to fill in the Google Feedback form.
We have continued to focus on ingesting Manage My Home housing data, moving to a Production environment and researching the usability of the Capita insight tool to ingest Council Tax data into the platform.
Making housing data available for reporting
Reporting from the suite of Modern Tools for Housing applications is currently very limited or not available at all, so we’ve been collaborating with developers in the Manage My Home team to get this data into the platform. The end goal is that this data can be used within dashboards to provide important information to service managers and product owners, for example how many tenures are created or amended within Manage My Home. There are 4 broad steps to this work:
- Getting the data into the platform – In progress
We are working with the Manage My Home team to develop a new, reusable event streaming process to collect data from the platform APIs (e.g. tenure API, person API) and put it into the data platform. We’ve been working on several elements of this process:
- Development of a Lambda which sends the message from the API to Kafka. Last week we ran end to end testing on this Lambda and have worked to fix any issues.
- Setting up Kafka, the tool we’re using to receive the data in the platform. During our setup of Kafka we ran into issues with Kafka Connectors, an extension to the AWS implementation of Kafka that facilitates the deployment of connector modules. In our case we needed to use this to deploy an S3 writer module that would allow Kafka to stream data directly to S3.
We hope to have these elements in place this week so that we can test the whole process. However, this will only supply us with data from new events that are streamed (e.g. a new person is created, or a tenure is updated) but won’t provide any historical data. We’ve developed a proposal on how we’d do this via changes to the platform APIs which we’ll be discussing with the dev team this week.
- Transforming the data so it’s ready for analysis – Scoping
Once we have the data in the platform, we’ll need to write scripts to transform this data so that it is ready for analysis. For example, we’d need to bring together multiple event streaming files to develop a single daily snapshot. We are still scoping out how much transformation is required, as this will be led by the reporting needs. We started the week thinking this was to provide tenancy information to the housing finance team, but this need can actually be met by joining up the Manage My Home and Housing Finance tools directly rather than going through the data platform. We’re planning to catch up with the MTFH team next week to get more clarity.
- Connecting the data to a BI tool (e.g. Qlik) – Complete, using processes already in place
- Making dashboards – Not started
As with data transformation, we need more clarity on the reporting needs and priorities before we can scope the dashboards.
Providing a read-only production environment for data and processes
We have moved closer to our goal of moving the data platform to a production environment. A key part of this work has been the automation of Redshift configuration which was previously done manually by the team. Redshift, a data warehousing product, can be used by visualisation tools such as Qlik as a proxy to connect to data held within the Data Platform.
With the configuration process automated, departments onboarded to the platform will have consistent, reusable access to the data platform from multiple environments without waiting for manual configurations of the service. This will also ensure that permissions and security remain consistent between setups.
We have also worked to migrate our Qlik EC2 instance from the Production API account to the Production Data Platform account. However, we’ve run into a few blockers:
- Some of the existing connections used by Qlik in the Production API account are not functioning as expected in the Production Data Platform account and actions are being taken to test and resolve these issues.
- The cloud engineering team has been configuring AppStream against the Production Data Platform version of Qlik to ensure continued service when we switch over to the new account. The AppStream instance has been created and we now need to enable Google Single Sign on to test functionality
In addition to the requirements of moving Qlik to the Production Data Platform account, we have been working with the Cloud Engineering team to provide an alternative connection method that does not require AppStream. This is being provided in the form of a Global VPN solution which we have already successfully tried on our Staging account. We are now waiting for the implementation of this solution on the Production accounts which we hope to trial after the migration to the Production Data Platform account.
Making Council Tax data available for reporting
We’ve been trying to get data from Academy (our application for Council Tax, business rates and Housing Benefit) into the platform as these are crucial services that need reporting as well as strategic datasets that have a lot of potential for cross-service use. After looking at several ways to ingest data from Academy, (see this weeknote for more info) we have decided to investigate the possibilities offered by Academy’s insight tool. We don’t plan to use the Insight tool itself because it’s a separate business intelligence tool to create dashboards, and would silo off this data rather than help democratise it. However, setting up this tool creates a reporting database on top of the live database, and it’s this reporting database that we’re now trying to connect to the data platform.
We weren’t sure if it would contain all the data we might need, but initial investigations suggest it has most of the key tables that were used in previous analytical outputs relating to Council tax, Housing Benefit and Business Rates.
The next step is to determine how to best ingest the data into the platform so that we can fully test the completeness of the data to meet the requirements of colleagues in Revenues and Benefits. We will need a tech spike to determine this, but are prioritising the MTFH data first.
- Continue to work with the Modern Tools For Housing teams to ingest data into the platform, and understand more about their end reporting requirements so that we can plan the transformation that may be needed
- Start a troubleshooting section of our Playbook, starting with a partitioning issue that’s cropped up twice now for parking users when data tables get very large