Data Platforms Weeknotes 18: 01/02/22

About the Data Platform
The vision for Data Platform is to rebuild a better data infrastructure to  deliver a secure, scalable and reusable cloud-based platform that brings together the council’s key data assets. This will enable us to democratise access to data (where appropriate), use technology to enable deeper insight, and derive greater value from our data to improve the lives of residents in Hackney. For more information, see our Playbook.

Making housing data available for reporting
We’ve been collaborating with the Manage My Home team from Modern Tools for Housing (MTFH) to set up an event streaming process to get data into the Data Platform for some time (if you missed our latest Show & Tell, you can watch our tech Lead Engineer, James, give a good overview here). This week we celebrated the first part of our event streaming process working with the Manage My Home Application. This has involved an event being generated in the front-end of Manage My Home, passed on to the lambda function, then being sent to Kafka in the Data Platform account.

Our progress this week was a breakthrough, but unfortunately we are still having trouble with the last step of this process when Kafka writes the received data out to S3. We had this working before but it’s now stopped working again after making various changes, so this week we need to focus on debugging and getting this full end to end process working. We will also continue our discussions with the MTFH and dev teams to agree on an approach to state syncing which will enable us to ingest historical data, not just new events, and to re-synchronise our data in the event it becomes out of sync with the Manage My Home Application.

Ingesting and using Noiseworks data from the platform

We have completed some work to make data and reporting available for the Noiseworks team, and test out our ‘time to insight’ in the process. Noiseworks is an application for managing noise complaints which has recently gone live in the Environmental Enforcement team. To get their data into the platform, we followed a similar process to how we’ve ingested Liberator (parking) data: we liaised with the third party supplier to create an AWS S3 bucket for them to drop files into, then exposed the data to be consumed in Qlik.

The service went live last Friday and we were able to get dashboards to the team by the following Tuesday. This is an example of how reusable processes can help achieve ‘faster time to insights’ – our North Star objective.

An example noiseworks dashboard:

Making additional Planning (Tascomi) data available for reporting
We’ve previously set up a process to ingest planning data from the Tascomi API, but realised that the API didn’t provide all of the data we needed. The supplier recently added new endpoints for us to access the additional tables we requested, so this week we onboarded two new data analysts (Marta and Tim from D&I) on to the platform to help us ingest them. This process involved them learning about editing a data dictionary, utilising a Terraform script, adding data quality checks to the PySpark scripts, and submitting a pull request on GitHub. The feedback from the analysts was that the process was fairly straightforward and that the documentation they were able to reference in the playbook was useful. This process has helped to reassure us that the reusable processes we’ve been building are indeed reusable!

Improving analysts experience of using the Data Platform

We have found that onboarding analysts on the Data Platform can be challenging due to the number of new tools and technologies that are required to gain access. Analysts have had to switch between different interfaces and use different dialects of languages such as SQL when creating scripts and Glue jobs. This has added a level of complexity in addition to many analysts having to learn many new skills to access the platform.

We hope that by using an online notebooking tool, some of this complexity can be eliminated. However first we have to assess the notebooking tools that are available to us and how they fare against the needs of our users. We have started this process with several members of the team helping out to evaluate tools.

Up Next:

  • Fixing the Kafka issue that’s blocking our end-to-end streaming process for housing data
  • Agreeing an approach to getting historical data from platform APIs
  • Running a Tech Spike on how best to ingest Council Tax (Academy) data into the Data Platform
  • Reviewing the criteria and objectives for notebooking tools as a team in our weekly Collab session
+ posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.