Data Platform Project Weeknotes 13: 23.11.2021

For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.

Improving the Data Platform playbook

We have spent some more time restructuring our Data Platform playbook. After thinking about the particular needs of data scientists, analysts, engineers and managers and mapping the common and unique parts of the playbook that they may need to access, we have started the process of restructuring the playbook so that it uses a much clearer navigation menu for these users. We see this as a two part process involving restructuring the menu and refining and adding necessary content. 

One example of additions made to content in the playbook is the process  around simplifying creating Glue jobs in code (see weeknote 12 for more information). After writing the content (in this case instructions) for the playbook we have spent some time making sure that they are user friendly to analysts with a broad range of familiarity with such tools as Terraform and GitHub.

This testing is set to continue with our colleagues in the Parking team but has already given us a lot of insight into how we can make the playbook more user friendly and accessible.


Collaborating with Manage My Home

Last week members of the data platform team presented a proposal  on the benefits of using Kafka over an AWS S3 bucket for the event streaming process to the HackIt Technical Architecture meetup. 


Kafka is an open source software which provides a framework for storing, reading and analysing streaming data. We looked at the positives and negatives of introducing Kafka over the current SNS/SQS solution and believe that Kafka provides a more reliable and scalable solution to meet the needs of the data platform. After some lively discussion and debate, we are waiting to find out the next steps for this and are keen to start working with the Manage My Home team on event streaming as soon as possible. Ultimately this will enable us to get the data into a BI tool like Qlik so that the Manage My Home team understands how the tool is being used.

Exploring our Roadmap and Ways of Working

We held another workshop to further refine our product roadmap. As a team we identified all the possible user needs that various analysts, engineers and managers might have when using the platform. We then looked for commonalities between these user groups and mapped the needs on  an affinity map. This process enabled us to refine user needs by priority, complexity and difficulty.

We have also held a workshop to reflect on our current team pattern of agile working and ceremonies. There was a lot of debate about Scrum vs Kanban as agile processes and the best approach when it comes to estimating the complexity and time to complete a task when planning. We have come up with some changes which we hope will make the planning process more efficient. However, we acknowledge that this is an evolving process and one that we will reflect on the success of in the near future.  

Ingestion and use of Tascomi planning data

Work is still ongoing to change the Tascomi ingestion process in a way that stores daily data snapshots in the platform. This sprint, we are also attaching data quality checks to the process.  This is an opportunity to test and refine an entry in our Playbook.

Adam Burnett from the Data and Insight team is deconstructing previous Qlikview reports to understand the business logic behind key KPIs, sourcing the relevant data in the Tascomi tables and recreating them for the Planning team to review. In some cases this means identifying new datasets that need to be added into our daily loads. 

Email ana.bebic@hackney.gov.uk if you have any question about that Data Platform.

Data Platform Weeknotes: 11.11.2021

Weeknotes: 08.11.2021

For more information about the HackIT Data Platform project please have a look at this weeknote on the HackIT blog.

Improving the Data Platform playbook

We have spent some time this week rethinking the structure of our Data Platform playbook. Despite being happy with a lot of the content that has already been added to the playbook, we felt that we needed to spend some time looking at particular user journeys and how to make the experience of using the playbook as comprehensive and accessible as possible. 

We have thought about the particular needs of data scientists, analysts, engineers and managers and mapped the common and unique parts of the playbook that they may need to access. We are in the process of restructuring the playbook so that it uses a much clearer navigation menu for these users.


Collaborating with Manage My Home

We’ve met with the Modern Tools for Housing team to agree on a reusable process to stream data from their application. We hope that our data platform data engineers will be able to help the Manage My Home team in this process.

We have put together a document which explores the benefits of using Kafka over an S3 bucket for the data streaming process. Kafka is an open source software which provides a framework for storing, reading and analysing streaming data. We also need to consider how Kafka might work or if it has any limitations working with the current .net based API architecture.

Simplifying creating Glue jobs in code

A Glue ‘job’ refers to the business logic that performs the extract, transform, and load (ETL) work in AWS (Amazon Web Services) Glue . When you start a job, AWS Glue runs a script that extracts data from sources, transforms the data, and loads it into targets.

This week we have worked to ensure that the process for creating Glue jobs is clearly documented so analysts can easily create them. We are also working to make some Terraform templates (a tool to manage the entire lifecycle of infrastructure using infrastructure as code) that can be reused by copying & pasting, requiring no prior knowledge of Terraform. We want to make sure code is organised to make it easy for an analyst working for a specific department to know where to put code relating to them.

Up next we will be looking at testing the  process with analysts to make sure our work meets their needs and in order to decide if further refinement of the processes is required.

Creating a refined (Tascomi) planning data set with a daily snapshot

Now that Tascomi data is getting into the platform every day, we are in a position to refine our workflow. This means that only data increments will be processed every day (through parsing and refinement). After this process, daily data increments will be incorporated with the previous version of the dataset to create a daily full snapshot. We’re still testing this approach and we’re hoping it will make Tascomi data (both current and historic) easy to access for planning analysts.

Using Redshift with Tascomi data

Redshift is a cloud based data warehouse product designed for large scale data set storage and analysis. Exposing the Tascomi data in the Redshift cluster means that we now have daily loads into Qlik ( business analytics platform) with only the latest current version of the data stored into tables for analysts to use. We have started to create a data model that pre-builds the associations between the tables for easier interrogation.  Analysts can also use Redshift connectors into Google Data Studio. 

Next Show & Tell – Friday 12th November

Our next Show & Tell is on the 12th of November at 12-12.30pm. Come along to find out more about what we are up to and invite others that may be interested (the calendar invite is open). Email ana.bebic@hackney.gov.uk if you need any help. For anyone that can’t make it, don’t worry we will record the session and post on Currents/Slack after. 

Data Platform Project weeknotes 29/10/2021

What is the Data Platform project?

Hackney has long viewed its data as a strategic asset that has the potential to deliver insights to help us make better decisions and improve the lives of our residents. Behind the scenes of a statistic in a report or a dashboard are the tools, processes and infrastructure needed to get access to our data, move it, store it and transform it so that it can be used for analysis. That’s where a data platform comes in.

A data platform is an integrated technology that allows data located in data sources to be governed, accessed and delivered to users, data applications, or other technologies. We’re using the recovery from the cyber attack as an opportunity to ‘build back better’ for the future and deliver a secure, scalable, reusable cloud-based data infrastructure that brings together the council’s key data assets. We want our data platform to help us democratise access to data (where appropriate), use technology to enable deeper insight, and derive greater value from our data to improve the lives of residents.

In practice, our data platform will be composed of a number of different elements:

  • Data Lake – a centralised repository to store data all in one place, and a set of loosely coupled processes to ingest, process and publish that data (see diagram below).
  • Playbook – documentation of the platform’s tools and processes, along best practices when interacting with them
  • Data catalogue – documentation and metadata about specific datasets, columns etc.

Feedback on outputs from the Data Platform

The Data Platform team previously worked to bring together disparate repairs data into a single, cleaned repairs dataset that could be joined with other data on resident vulnerability.  Our goal was to identify households that hadn’t had a recent repair and were potentially more vulnerable so that the council could make proactive contact to check in on both the needs of the resident and the condition of the property. Through this work, we were able to give the new Link Work team (who provide targeted, holistic, and proactive support to residents) a list of residents who were aged 70+, living alone who hadn’t had a repair in two years or more. Link Workers have started to make calls to these residents and we have been receiving some excellent feedback on how well targeted these interventions have been.

The Link workers recently dealt with the needs of a resident  in her mid-70s, living alone who has health conditions that limit her mobility. She had had outstanding repairs issues and her property was very cluttered, but she has worried about raising things with her TMO because she’s afraid of how people will perceive her. She has also struggled with navigating the benefits system to claim Attendance Allowance.

The Link work team proactively reached out to her because of the data insights they were able to surface from the data platform. She said it was ‘a blessing’ to have someone check in on her and felt a weight had been lifted. She’s now receiving food support, had a visit from a therapeutic decluttering service, and is getting financial advice.

Our challenges

Our main goal was to start to ingest Council Tax data from the Academy. In addition we continued our work on making sure analysts are able to use planning data in the platform with limited support from the team.

Meeting these sprint goals has been challenging. We as a team have experienced a lot of frustration due to our inability to get access to the data we need. On numerous occasions the means of accessing the data have been insufficient for bulk reporting.

Getting Council Tax data from Academy

Academy is the Capita owned system that contains records relating Council Tax, Housing Benefit and Business Rates. It uses an INGRES database which is notoriously difficult to extract data from but has massive potential for analytics.
After  undergoing  a research ‘spike’ to decide on the best approach to ingest council tax data from the Academy, we discovered firstly that there was an API but it doesn’t support the bulk downloads we’d need to use it effectively. We also knew that it wouldn’t be great practise to connect straight to the database and run complex queries as it may slow down the academy software.

We investigated restoring the Academy app from a disk backup but discovered that the encryption keys don’t allow sharing across AWS accounts.  We also investigated creating a ‘read’ replica. Ideally this would mean a copy of the database would sit in the Academy account which we could connect to and query the database. However, this is currently blocked by a lack of access and requires some negotiation with the vendor.

Getting Planning Data from Tascomi

Tascomi is the system used by the planning team. We have been provided with an API by the vendor to access planning data.

The first version of Tascomi data workflow has been deployed. This means that we are now able to get  a change-only update each day of new or amended records and add this  to our previous snapshots to create a full dataset. We are currently onboarding a planning data analyst on the use of Qlik and Athena. We are also getting the help of another data analyst (Adam from the Data and Insight team) to help with further refinement.

However, we continue to have some challenges with the vendor, Tascomi, turning the API access on and off without prior warning. 

Next Show & Tell – Friday 12th November
Our next Show & Tell is on the 12th of November at 12-12.30pm. Come along to find out more about what we are up to and invite others that may be interested (the calendar invite is open). Email ana.bebic@hackney.gov.uk if you need any help. For anyone that can’t make it, don’t worry we will record the session and post on Currents/Slack after.