Data Platform Project Weeknote 24

About the Data PlatformThe vision for the Data Platform is to rebuild a better data infrastructure to  deliver a secure, scalable and reusable cloud-based platform that brings together the council’s key data assets. Enabling us to democratise access to data (where appropriate), use technology to enable deeper insight, and derive greater value from our data to improve the lives of residents in Hackney. For more information, see our Playbook and our Glossary.    

How we’ve been building the platform this week:

Creating a reusable process to ingest data from a MS SQL database so that we can make Council Tax data available for reporting and reuse: 

We now have a fully automated and re-usable process of ingesting data from databases onto the Platform. We have successfully ingested parking data from Geolive, making this data available to our Parking analysts. Geolive is the corporate GIS database held in Postgres/PostGIS. It contains data from different departments, including Parking orders (these are managed in Parkmap and exported nightly as GIS flat files loaded into Geolive).

We’ve also significantly improved the cost and time efficiency of the entire ingestion process of Academy data as it previously took 15 hours + to ingest all of the 291 tables. This particular ingestion process splits up the tables into several Glue jobs which run concurrently taking no more than a few hours each to complete. The data is then programmatically split into the raw zone areas of two new departments: Benefits & Housing Needs and Revenues where relevant analysts can access, query (via Athena/ Redshift) and prepare the data for dashboards in BI tools such as Qlik. The work is currently being reviewed by our team and should be live very soon.

Creating a reusable process to ingest data from our in-house applications through event-streaming so that we can make housing data available for reporting: 

We now have a working setup for the Kafka implementation in Amazon MSK. We have been working to productionize the infrastructure and move the setup fully into our Terraform repository so that we can deploy the setup to Pre-Production and continue testing with Manage My Home. At the same time we’ve continued talking with the AWS team to evaluate if there is any potential to get the AWS Glue Schema Registry implementation working as a better alternative to the self-managed Schema Registry that we have working. With the current self-managed Schema Registry we will be required to make some changes to the Lamdba function implemented by the Manage My Home team, however with the AWS Glue Schema Registry implementation this additional work wouldn’t be required.

Migrating Qlik into the platform infrastructure and moving away from AppStream so that we can improve the user experience for staff:

 Following the Technical Design Authority’s decision to proceed with Qlik on the web with a WAF we were also asked to look at the user needs that result in downloading of the data and explore if this can be met through other means. Last week a number of users from Children and Families Service and Housing explained how important the feature is for them to carry out some key functions. However there are also opportunities to reduce the need if we can automate saving the data into google sheets for them to use. We are working with analysts in the service areas to deliver these whilst the Cloud Engineering Team configures the WAF.

How the platform has been used this week:

Academy data in Qlik so that analysts can use it : 

Now Academy data is in the relevant department’s raw zones, we are able to query the data in Redshift from Qlik and recreate the majority of the data pipelines that were in use prior to the Cyber attack. One focus has been to recreate the “Housing Benefit and Council Tax Support profiler” in Qlik which provided the service with an overview of the current claim caseload. We have been able to reproduce the data and refresh the dashboard and will spruce up the dashboard before sharing with the service. Housing Benefit data is also an integral source of the Supporting Families team (FKA Troubled Families) who use the data to evidence that cases have moved back into employment and claim payments from central government. We shall imminently be reinstating the reports for the team so they can restart the Payment by Results process. 

Steve Farr has also been working with Dave Ellen, an analyst in Revenues, to recreate commonly used queries relating to Council Tax and Business rates. Steve has successfully recreated a number of Dave’s queries in Athena and will be carrying on this work and exploring how to make “trusted” data sets available to managers in the service and onboarding Dave to use Athena to query the data on the Platform. 

We created a reusable training module to help onboard users so that more staff can use the platform to complete their work: 

This week we continued our training program, and ran module 3 training for our team members. We’ve used the feedback from the training to refine our processes. We’ve also been putting together a form which we will be rolling out to people who would benefit from being trained in the use of the data platform for the purpose of data analysis.

Up Next:

  • Continue to develop trusted datasets for Council Tax and recover previous data products dependant on Housing Benefit data
  • Continue testing our event streaming process with the Manage My Home team
  • Support the Cloud Engineering team to implement Qlik over the web, and draft another paper for the Technical Design Authority about the downloading of data
  • Investigate how we make Alloy data available to analysts in Environmental Services through the Alloy API
+ posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.