Why getting the right name for your APIs is so important

Series 1- Chapter 2

To continue from our previous blog post about our journey of defining platform & service APIs and what we have learned from it, we knew that the next step in this journey without a doubt should be about data taxonomy.

Organizations are often locked down with legacy,mid-age, clunky, etc (;-)) databases. We have the challenge of tackling cases where our on-premises databases don’t have any basic database principles applied. In other cases, we found that people are still reluctant to bid goodbye to their tightly coupled processes and/or are scared of any process changes that might impact them. These databases are typically a developers’ nightmare especially when it comes to integration. I have also been in situations where  we would like to open up the data from the source database and make it more available (of course in a secure manner). So when we’re building an API to unlock data from a legacy application or business process our challenge is: ‘What do we call that API so that it’s clear what data it presents without assuming how it should be used?’

So this is why data taxonomy is so important to the design of our Platform APIs 

Data Taxonomy is a process of classifying your data and building a hierarchical structure which becomes the foundation for the API. And then, across our APIs, the taxonomy helps us explain the relationship between different data models.

We started our journey with identifying our core entities and tried to model different data domains around those core entities. Believe me, the workshop was well executed and we ended up with a nicely drawn spider web of all domains which are related to our core entities – People and Properties. This step was really important to build our landscape of Platform APIs and understand how the usage works between different services as we did not want to have our fingers burnt again. Also, it helped us to identify the development of future platform APIs as well.

We have picked up “People” as the core entity. But in a local government context, that means different things depending on the context. For example, the housing tenant might be different from the council tax payer and the holder of the parking permit.So we then refined that further as Resident Information API (we’ve decided that contact details should sit as a separate layer). I will be honest here, initially I thought this would be easy but slowly reality kicked in. In our discovery session, we realized we have 26 data sources and maybe more that store contact information about residents. The team was gobsmacked.. We immediately asked the question of how would we cope with this mammoth task? The team knew that this was a task that will involve a lot of iterations and improvements, so we decided to start with baby steps and learn from them. 

We started with 6 initial datasets which we thought would be a good starting point –

  •  Adults and Children social care data, 
  • Housing data set,
  • Housing benefits dataset, 
  • LLPG data (holding the address information) , 
  • Flags we’d identified that might indicate someone was vulnerable (eg living alone)  
  • Data about people who asked for help during COVID.

The use case that followed was to provide a platform API that retrieves data from individual lines of business applications APIs in order to provide a consolidated view of all the data we hold about a given person across multiple data sources. We are also considering that as part of future iteration we should tackle storing audit logs of any changes made to the data. 

Hackney has a track record of tackling this issue. Our Citizen Index is more than 10 years old and enables us to match people’s records across business applications and verify their identity. So next, we will build on our understanding of this to determine whether or how we create linkages between different records of people. 

Having a data taxonomy designed for our APIs helps us to think in the right direction in terms of data relationships and building a knowledge model around different types of data domain. In other words, the taxonomy for a given domain should be the foundation step for any API development. Having an idea of the relationship between those domains will help you when designing your microservice architecture, simplifying development and integration. 

What’s Next?

  1. Building the SwaggerHub definitions and best practices
  2. Publishing our domain structure
  3. Securing our data
  4. Data Migration Mechanism

Appendix

Data classification is the process of organizing the dataset into relevant categories which helps efficient data management, improved data security, and streamlined compliance audit when it comes to sensitive data.

Unleashing the data beast: Our Journey so far…

What do we aim to achieve via this blog?

This blog is designed for HackIT and external organisations to evaluate our API-first strategy towards data domain modeling and how we approach our data needs! It will also give us the confidence to explore further in terms of architecture and data taxonomy. We understand the importance of reusability across service needs and designing the core architecture in such a way that it doesn’t add up more technical debt for the future.

Why…How….What…???

  • Why are we building Platform APIs and what value ,in terms of return of investment,will they bring?
  • And, perhaps the biggest question, what are they?

We all wish to have an advanced ecosystem that is led by data and I am sure it is a dream for all to have a scalable, robust, secure and resilient platform to achieve it. Whenever we build any service, these are questions that come to mind:

  1. Is the service highly available and reliable and how secure is our data?
  2. How do we scale to meet the demand in traffic from our applications and ensure they’re reliable?

But most importantly we would like to ensure that we follow these principles:

  1. We want to help residents and solve their problems via technology.
  2. Try not to introduce more problems than what we have currently!
  3. If the problems are huge, trying to find simple solutions is our first priority.

Software engineers follow many principles and they change over time but two in particular have always resonated. Typically within the world of IT they take the form of acronyms.

Firstly “DRY” which stands for “Don’t repeat yourself”. At the micro level that means you shouldn’t have the same line(s) of code doing the same thing more than once within your code base. For example, if you connect to a database then do it ONCE and make it a reusable function. This makes code easier to change, test and maintain as well as reducing inconsistency. At the macro level this still holds true. You do not want the same thing being done by multiple code bases (think multiple APIs that do the same thing). Violations of the DRY principle are sometimes referred to as “WET solutions” (yes another acronym). “WET” can stand for “write every time”, “write everything twice”, “we enjoy typing” or “waste everyone’s time”.

This repetition between APIs isn’t just inefficient but it can lead to inconsistencies and also waste developer’s time by reinventing the wheel. For example, we already have more than one API which pulls out Cautionary contact data — which is information about residents who may be violent or are vulnerable. This data is vital for protecting both our residents and our officers. These APIs pull the Cautionary contact data from multiple data sources. In fact, within just ONE of those data sources (Universal housing) there are multiple different locations (database tables) where that data is held, which means there is a lot of room for error. A potential upshot of this is that an officer could get sent to do a repair at the home of a violent resident without knowing it EVEN though the council does have that information somewhere.

This is where Platform APIs come in. They will ensure that we are only writing code ONCE where that code accesses a particular dataset. If an application or a different API (what we are calling a “Service API”) needs to access the same dataset it will do so via the SAME Platform API. If we need to change the source for that dataset, or add an extension we can do so in ONE location and all of the clients will receive the change. The structure of the response data all of the clients receive will be identical and standardised. From a security perspective we will have ONE gateway into our data which means there is only ONE place we need to secure it.

The second key acronym is “KISS” which stands for “Keep it simple stupid” (more polite versions also exist). The simpler an API is, the less chance there is that something can go wrong with it. The simpler the API is, the easier it is to test. We want to ensure that the APIs we provide are built to the same standards,will be easy to use and so good that people prefer to use them.

We know for a fact that legacy systems tend to be stable but they can be slow-moving, sometimes offline during scheduled system maintenance and unable to cope with technology evolution. This results in services being non-responsive when the backend is busy with applying changes or any massive process is running. What happens, in the end, is that ICT gets questioned about service non-availability, residents are not happy, etc. So how do we solve these problems??

We engineers always think from the dimension of unraveling the data — make it securely open, it’s reusability, the approach taken, catering to similar needs, providing seamless user experience, and make it available for consumption purposes so the services can benefit from it. This allows us to get things going quickly and help us to think in the direction of problem-solving. If the approach becomes sneaky or manipulative then the goal is never achieved. In other words, try to build something which benefits people who use it.

This quote perfectly indicates our scenario.

“Nothing happens unless first a dream ” — Carl Sandburg.

So what is our goal and dream? — Have a series of Platform APIs used at their best!!

We began our journey with APIs to achieve the above goals. We brought the concept of Platform APIs and Service APIs. Identified the benefits for each of the categories along with defining the criteria for each one. Reusability, consistency, quicker access, meeting the user, and data needs (the most important foundation for our approach), and the development of services were the key areas we were looking forward to achieving. The journey was not straightforward.We had to ensure that we are conveying our vision to our colleagues in an understandable, non-technical manner to ensure a collaborative approach. At the same time, we realized there is a mammoth task of understanding the domain model, what needs to be corrected while working with data that is dispersed with no relation whatsoever in the first place. We have got challenges in every direction. (sigh!!)

My point of action was getting the team trained on the latest technologies to build reliable Rest APIs which we all took advantage of and accepted as a challenge to excel. To my surprise, the team smashed it beyond my expectations. By this time we had a suitable platform in place for deploying our APIs and making it securely available to public consumption. We now continue to grow and improve our criteria for a successful framework.

We also have been challenged many times to think about front end services that would use our API. Our answer to this is – if the data is currently being used for a certain internal process why not make it more available? Why don’t we divide the domain into reusable components, which we know are likely to be the foundation to future services? For example, we know that building an application for residents to check their rent balance means that we need to retrieve “people data” (names), transactions, addresses, etc.Those sub domains can be exposed via individual Platform APIs that are then used for any other service that needs the same data. One of the questions we often get is why aren’t our APIs driven by specific user needs related to a front end/service? Because then the APIs would be what we call “Service APIs” — built according to specific needs around one service, which will prompt for duplication in APIs as they will be all very similar due to consuming the same data domains. For quick wins, these service APIs could look very attractive but in the long run, will they be useful? To how many services? Are we duplicating effort? Are we making more room for error? — food for thought!! Following HackIT’s principle of “Fail fast”, we have learned that quick wins often mean more work and they are not viable future solutions.

We strongly feel the data needs should drive API first strategy and not the frontend needs as they tend to change a lot in a given period. These data-driven APIs will shine when we have connected the cloud-based ecosystem using these APIs to their full potential, so why would we make them specific to a service?!! I understand there will be scenarios where we have to develop quick APIs but it shouldn’t always be the case!! However, there is another side of the coin as well. We have data sources that are ‘x’ years old and have no basic database principles applied whatsoever. Building APIs with these data sources as the main entities is a huge challenge because there are no strong principles for any data taxonomy for us to follow.

We have currently partnered with Madetech & AWS Solution Architect -Manish,to achieve the goal of producing an API that will bring resident information from several data sources, achieving a consolidated view.Our aim is not to create a new data source like Citizen Index, it is to provide a consolidated data view in real-time as retrieved by the APIs. This approach will also allow the data to be used for real-time analysis and visualization.

What are we trying to achieve

  1. All services consume Platform APIs, rather than directly speaking to underlying data sources, and thus centralizing the access to data.
  2. Build data pipelines, to make existing data more available from on-premise data sources such as Qlik data etc.
  3. Getting the data normalized and in a sensible format will be a dream to achieve as some of the data sources have no degree of relevant normalization.
  4. Ensure open data standards are followed.
  5. Have a catalogue of all cloud-based platform APIs in the API Hub. The API Hub should be one-stop for all APIs with the required authorization process in place.
  6. Understanding our core objects well and defining data taxonomy around it and further data models to be built and stimulating innovation via our open APIs.
  7. Exposing APIs by establishing data contracts as a step forward.
  8. Engaging different services to understand the vision.
  9. Secure our APIs in a way that does not open possibilities for misuse.

Things to keep in mind or something to explore in the future

  1. How do we keep in mind API first when designing the relevant data pipeline.
  2. As an organisation we have both structured and unstructured data available. I’ve been reading about how a data lake approach enables some organisations to use the data at its best and enables us to build complex business processes powered with advanced analytics. We have several case studies available which could benefit us to learn from. But we want to learn more!

Standing on the shoulders of giants

Hackney Council is prototyping an API-centric digital architecture.  The HackIT manifesto tells us to ‘learn more’ by  it was important for us to learn from experts who has successfully implemented a WebAPI in a government environment on a larger scale.  So HMRC was the best candidate to visit and provide our developers with a unique opportunity to learn from the experts.

The visit to the HMRC office, based at Shipley was really useful and gave us a deep understanding of how they implemented and maintain their API platform.  The team took us through their development journey for their API platform and provided us with insightful information to support us on our journey.

The journey of the API-centric development at HMRC has started in January 2015 with the involvement of 3 teams initially.  After 4 months of recruiting people with the right skills, they have started their API development, taking 2 years to publish the first public APIs. The most requested API was from taxpayers for self-assessment module. The team had SMEs and contract developers to deliver the first set of APIs and gradually the team expanded based on user needs.

Key things learned from this visit:

  1.     Overall Technical Architecture: HMRC has an in-house developed application which monitors the ongoing management of subscription, authentication, authorization and metrics processing for requests coming via client requests. In retrospect, they mentioned that it would have been better to go with the external vendor instead of building the in-house application which resulted in additional efforts to support it. However, they also explained the reason for using this approach to avoid vendor lock-in. Adopting a similar strategy would be very beneficial for Hackney to be able to monitor and maintain the WebAPI efficiently.  They also use multiple level of abstractions in their infrastructure, such as proxy servers, gateways etc in order to make the API secure.Most of their infrastructure is hosted on AWS.

This inspires us to revisit our API architecture and create layers of abstraction before deploying our API public so that it is not subjected to vulnerabilities.  It also prompts us to begin thinking about how we can host our API platform in the cloud. There will be some challenges around connectivity to our back-office systems, but products such as Direct Connect offered by AWS do give evidence that this hybrid platform is a viable solution.

It is also noted that they follow the DevOps approach for service delivery whereby one or two infrastructure staff are included in development teams for API delivery. This is a brilliant concept of having a platform team heavily involved in the Continuous integration of the WebAPI.

  1.     Product management and understanding user needs:

Even though they develop APIs, they still follow a user-centered approach for development, where the client developers are considered to be their users.  This extends their API-First approach where the endpoints are developed to meet user needs and not around the workings of back-office systems.

This approach makes us think “are we going in  the right direction when it comes to WebAPI?”. We believe, adopting a similar “API First” approach where we develop our end-points to the users (developers) would be beneficial to create a flexible, standardized WebAPI.

They have design clinics every month for introducing and discussing design principles and they often hold Hackathons, inviting external developers to work with their API as part of their user research. This is something we are currently doing as part of Show and Tells however we can expand this on a wider scale by inviting external agencies who are interested in our Web API.

  1.     Skills and Capabilities: Any skills not available internally are outsourced.  These skills are then introduced and shared within and among teams; this way skills are retained. As mentioned earlier there was a sense of having clear standards set up and it was embedded within the team by having the same taxonomy followed throughout.  We have adopted a similar approach of working with external companies but could do more around the transfer of knowledge and skills, possibly by highlighting key new skills in project retrospectives.
  2.     API Service Standards: They also mentioned that they are building their own API standards which based on GDS standards. The standards have been gone through a well-thought process with a lot of scrutiny in place and consultation with various teams including Technical architects and Enterprise architects. There is a separate team for API Standards and Assurance who is responsible to review the changes in the API and also make sure it follows the standards they have set in. This well embedded process itself is something we are lacking on. We are currently working on our own Hackney Development Standards which will ensure that same taxonomy is followed throughout our API and new developers coming on board can easily understand it.

In all, it was truly a well-spent day with HMRC API team with so much to take on board since they have a massive API implementation. We did realize that it would have added even more value if we were to invite our platform colleagues along with us to get more information on their infrastructure; maybe next time.