Data Platform weeknotes 15 07.01.2021

For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.

We have had a slightly quieter period prior to Christmas with lots of team members taking time to rest and rejuvenate after a very busy year. We now start 2022 in an introspective mood as we continue to take stock of our achievements and reassess our ongoing project priorities and ways of working.

We are holding a team-wide project retro in which we hope to  not only celebrate our achievements but also discuss aspects of the project which we need to improve or make iterations on. In addition we are very much looking forward to presenting at the HackIT Strategy Show & Tell on the 13th of January.

We have continued to focus on Tascomi  Planning data, Manage My Home housing data and our move to a Production environment. 

Making housing data available for reporting

We’ve been collaborating with developers from the Manage My Home team to set up an event streaming process to put data from the platform APIs (e.g. tenure API, person API) into the data platform. We’ve set up a tool called Kafka, which is an open source software which provides a framework for storing, reading and analysing streaming data, to receive the platform API data in the Data Platform. We continue to work with the Manage My Home team to create a Lambda which sends the data to Kafka/the data platform

We have had some issues with the set up including finding an alternative way to set up the infrastructure as code. This is because we’re using an AWS-managed version of Kafka which doesn’t fully support terraform or cloudformation. We’ve now determined that we need to use an API for this and we hope this work will be completed soon.

Providing a read-only production environment for data and processes

We have moved closer to our goal of moving the data platform to a production environment. A key part of this work has been the automation of Redshift configuration which was previously done manually by the team. Redshift, a data warehousing product, can be used by visualisation tools such as Qlik as a proxy to connect to data held within the Data Platform.

With the configuration process automated, departments onboarded to the platform will have consistent, reusable access to the data platform from multiple environments without waiting for manual configurations of the service. This will also ensure that permissions and security remain consistent between setups.

We have also worked to migrate our Qlik EC2 instance from the Production API account to the Production Data Platform account. However, we’ve run into a few blockers:

  • Some of the existing connections used by Qlik in the Production API account are not functioning as expected in the Production Data Platform account and actions are being taken to test and resolve these issues.
  • The cloud engineering team has been configuring AppStream against the Production Data Platform version of Qlik to ensure continued service when we switch over to the new account. However, the account that holds AppStream has reached a soft-limit on the number of instances running and we are awaiting resolution by AWS.

In addition to the requirements of moving Qlik to the Production Data Platform account we have been working with the Cloud Engineering team to provide an alternative connection method that does not require AppStream. This is being provided in the form of a Global VPN solution which we have already successfully tried on our Staging account. We are now waiting for the implementation of this solution on the Production accounts which we hope to trial after the migration to the Production Data Platform account.

Making Tascomi data available to planning analysts so they can produce the reports they need

A daily snapshot of the Planning Data is now being produced in the refined zone. This will enable changes in Tascomi to be available to users within 24 hours. From a data storage perspective, we will only be processing changed records therefore reducing costs and resources.  Data quality checks have also been implemented to ensure that incoming data is of an acceptable standard before being moved through the data pipeline. Users including analysts can be reassured that data has been checked for issues such as duplicate records.

For more information about how the Tascomi Planning Data Ingestion process works, please have a look at our playbook documentation, which has been recently updated with information about the daily snapshot.

The data is available in Qlik and the majority of the Planning KPIs that existed prior to the cyberattack have been rebuilt. We’ll be accompanying Planning colleagues to produce the reports they need, e.g. regarding application response times. 

Up next

  • Take the opportunity of entering into a new year, project phase, and contract to reflect as a team and share this at the Strategy Show & Tell
  • Continue to trial Kanban as a way of planning our team’s work
  • Continue to work with the Modern Tools For Housing teams to ingest data into the platform, and understand more about their end reporting requirements so that we can plan the transformation that may be needed
  • Investigate how the Capita Insight tool, and the databases it creates, could help us ingest Council Tax data into the platform
+ posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.