Series 1- Chapter 2
To continue from our previous blog post about our journey of defining platform & service APIs and what we have learned from it, we knew that the next step in this journey without a doubt should be about data taxonomy.
Organizations are often locked down with legacy,mid-age, clunky, etc (;-)) databases. We have the challenge of tackling cases where our on-premises databases don’t have any basic database principles applied. In other cases, we found that people are still reluctant to bid goodbye to their tightly coupled processes and/or are scared of any process changes that might impact them. These databases are typically a developers’ nightmare especially when it comes to integration. I have also been in situations where we would like to open up the data from the source database and make it more available (of course in a secure manner). So when we’re building an API to unlock data from a legacy application or business process our challenge is: ‘What do we call that API so that it’s clear what data it presents without assuming how it should be used?’
So this is why data taxonomy is so important to the design of our Platform APIs
Data Taxonomy is a process of classifying your data and building a hierarchical structure which becomes the foundation for the API. And then, across our APIs, the taxonomy helps us explain the relationship between different data models.
We started our journey with identifying our core entities and tried to model different data domains around those core entities. Believe me, the workshop was well executed and we ended up with a nicely drawn spider web of all domains which are related to our core entities – People and Properties. This step was really important to build our landscape of Platform APIs and understand how the usage works between different services as we did not want to have our fingers burnt again. Also, it helped us to identify the development of future platform APIs as well.
We have picked up “People” as the core entity. But in a local government context, that means different things depending on the context. For example, the housing tenant might be different from the council tax payer and the holder of the parking permit.So we then refined that further as Resident Information API (we’ve decided that contact details should sit as a separate layer). I will be honest here, initially I thought this would be easy but slowly reality kicked in. In our discovery session, we realized we have 26 data sources and maybe more that store contact information about residents. The team was gobsmacked.. We immediately asked the question of how would we cope with this mammoth task? The team knew that this was a task that will involve a lot of iterations and improvements, so we decided to start with baby steps and learn from them.
We started with 6 initial datasets which we thought would be a good starting point –
- Adults and Children social care data,
- Housing data set,
- Housing benefits dataset,
- LLPG data (holding the address information) ,
- Flags we’d identified that might indicate someone was vulnerable (eg living alone)
- Data about people who asked for help during COVID.
The use case that followed was to provide a platform API that retrieves data from individual lines of business applications APIs in order to provide a consolidated view of all the data we hold about a given person across multiple data sources. We are also considering that as part of future iteration we should tackle storing audit logs of any changes made to the data.
Hackney has a track record of tackling this issue. Our Citizen Index is more than 10 years old and enables us to match people’s records across business applications and verify their identity. So next, we will build on our understanding of this to determine whether or how we create linkages between different records of people.
Having a data taxonomy designed for our APIs helps us to think in the right direction in terms of data relationships and building a knowledge model around different types of data domain. In other words, the taxonomy for a given domain should be the foundation step for any API development. Having an idea of the relationship between those domains will help you when designing your microservice architecture, simplifying development and integration.
- Building the SwaggerHub definitions and best practices
- Publishing our domain structure
- Securing our data
- Data Migration Mechanism
Data classification is the process of organizing the dataset into relevant categories which helps efficient data management, improved data security, and streamlined compliance audit when it comes to sensitive data.