Data Platform Project Weeknote 21 21.02.2022

About the Data Platform
The vision for the Data Platform is to rebuild a better data infrastructure to deliver a secure, scalable and reusable cloud-based platform that brings together the council’s key data assets. Enabling us to democratise access to data (where appropriate), use technology to enable deeper insight, and derive greater value from our data to improve the lives of residents in Hackney. For more information, see our Playbook.

How we’ve been building the platform this week

Creating a reusable process to ingest data from our in-house applications through event-streaming so that we can make housing data available for reporting: We’ve been working this week to deploy, configure and test a self-hosted schema registry as an alternative to AWS Glue Schema Registry which we’ve had issues using as part of our Kafka consumer. The Schema Registry is now deployed and storing our schemas. We’ve been setting up Kafka Connect to use the schema registry which has involved trialling out a number of different configurations to test whether our AWS Connect Plugin can use the Schema Registry to deserialize events that are pushed into Kafka. In short, our preferred solution to the blocker we’ve been grappling with in recent weeks looks promising but we need to complete our testing to be sure.

Creating a reusable process to ingest data from a MS SQL database so that we can make Council Tax data available for reporting and reuse: Good news! We have some Academy data on the platform! This week we completed this tech spike and have managed to connect to the Academy Insights SQL Server database from the Data Platform; creating a reusable process to ingest data from SQL server databases and other types of databases. We will now start pulling in the rest of the data to our Data Lake (S3 storage) and monitor the computing power required as there are almost 300 tables in the Academy Insight database. As highlighted in the ADR (Architecture Decision Record), there are still a few questions that need answering specifically around how we can reuse previous data warehouse work to get it ready for consumption by Data Analysts and other users. We will be addressing these questions and exploring ideas in the coming weeks.

Setting up AWS cost alerting so that we can better monitor our usage and spend: We have identified how to set a budget which will notify specific users if expenditure is forecasted or actually goes over a set budget. We have also created a lambda to dynamically adjust the budget limit based on the previous month’s outgoings. This is currently in testing.
Migrating Qlik into the platform infrastructure: We had been working to provide access to Qlik through the Global Protect VPN rather than AppStream, however our testing found that this wasn’t a viable solution. We’ve now proposed that secure access to Qlik be provided directly over the web via Google Single Sign On (which includes two-factor authentication) and this week we took this proposal to the Technical Design Authority (TDA). We had a good discussion but a number of issues came up – would an additional firewall be beneficial? What are the implications for the egress of data out of Qlik (and possibly onto users’ own devices)? How will the Qlik infrastructure be maintained to ensure we’re doing all we can to minimise any vulnerabilities? We need to explore these risks and controls further with our colleagues in cloud engineering and security before going back to the TDA. It’s crucial that access to Qlik is well considered, but unfortunately for users this means further delay to taking it out of AppStream and improving their experience.

How the platform has been used this week:

A reusable training module to help onboard users – In the last few weeks we have been building a set of training modules that take users through the process of ingesting, transforming and deploying data into the Data Platform. The training gives users hands-on use of the AWS Console, GitHub, Docker and Terraform. We are trialling the materials with a cohort of Data and Insight analysts before refining the materials for wider use. We delivered the first module (ingesting google sheets) last week and will cover transforming data in python notebooks this week.
Exploring how Single View could use the data platform – Single View was an application which displayed information from a number of other applications in one place to help officers better understand a resident’s circumstances and save time. It hasn’t been available since the cyber attack, and a new project has recently started to revive this type of functionality (see their first weeknote here). We met with the team to tell them about the Data Platform and discuss two possible ways to integrate: 1) Single View consuming data from applications like Vonage and Academy via the Data Platform rather than directly from the application. This would mean we’re not duplicating the effort to get at these data sources. 2) Using the Data Platform to match resident records across datasets to provide automated or suggested matches to users, rather than them having to match everything manually. We don’t have to do both of these things at once, and it makes sense to focus on exploring the first. However, the Single View team first needs to decide who their initial users are and what datasets they need.

Up Next:

Building:
Ingesting Modern Tools for Housing data: verify if our new schema registry will work
Ingesting Council Tax data: Ingest a full set of Academy data from the Insight DB so that analysts can start to work with this data
Working with the Cloud Engineering and Security teams to further assess the options for providing secure access to Qlik
Using:
Running the training module on transforming data in Python with members of Data & Insight

+ posts

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.