Unleashing the data beast: Our Journey so far…

What do we aim to achieve via this blog?

This blog is designed for HackIT and external organisations to evaluate our API-first strategy towards data domain modeling and how we approach our data needs! It will also give us the confidence to explore further in terms of architecture and data taxonomy. We understand the importance of reusability across service needs and designing the core architecture in such a way that it doesn’t add up more technical debt for the future.

Why…How….What…???

  • Why are we building Platform APIs and what value ,in terms of return of investment,will they bring?
  • And, perhaps the biggest question, what are they?

We all wish to have an advanced ecosystem that is led by data and I am sure it is a dream for all to have a scalable, robust, secure and resilient platform to achieve it. Whenever we build any service, these are questions that come to mind:

  1. Is the service highly available and reliable and how secure is our data?
  2. How do we scale to meet the demand in traffic from our applications and ensure they’re reliable?

But most importantly we would like to ensure that we follow these principles:

  1. We want to help residents and solve their problems via technology.
  2. Try not to introduce more problems than what we have currently!
  3. If the problems are huge, trying to find simple solutions is our first priority.

Software engineers follow many principles and they change over time but two in particular have always resonated. Typically within the world of IT they take the form of acronyms.

Firstly “DRY” which stands for “Don’t repeat yourself”. At the micro level that means you shouldn’t have the same line(s) of code doing the same thing more than once within your code base. For example, if you connect to a database then do it ONCE and make it a reusable function. This makes code easier to change, test and maintain as well as reducing inconsistency. At the macro level this still holds true. You do not want the same thing being done by multiple code bases (think multiple APIs that do the same thing). Violations of the DRY principle are sometimes referred to as “WET solutions” (yes another acronym). “WET” can stand for “write every time”, “write everything twice”, “we enjoy typing” or “waste everyone’s time”.

This repetition between APIs isn’t just inefficient but it can lead to inconsistencies and also waste developer’s time by reinventing the wheel. For example, we already have more than one API which pulls out Cautionary contact data — which is information about residents who may be violent or are vulnerable. This data is vital for protecting both our residents and our officers. These APIs pull the Cautionary contact data from multiple data sources. In fact, within just ONE of those data sources (Universal housing) there are multiple different locations (database tables) where that data is held, which means there is a lot of room for error. A potential upshot of this is that an officer could get sent to do a repair at the home of a violent resident without knowing it EVEN though the council does have that information somewhere.

This is where Platform APIs come in. They will ensure that we are only writing code ONCE where that code accesses a particular dataset. If an application or a different API (what we are calling a “Service API”) needs to access the same dataset it will do so via the SAME Platform API. If we need to change the source for that dataset, or add an extension we can do so in ONE location and all of the clients will receive the change. The structure of the response data all of the clients receive will be identical and standardised. From a security perspective we will have ONE gateway into our data which means there is only ONE place we need to secure it.

The second key acronym is “KISS” which stands for “Keep it simple stupid” (more polite versions also exist). The simpler an API is, the less chance there is that something can go wrong with it. The simpler the API is, the easier it is to test. We want to ensure that the APIs we provide are built to the same standards,will be easy to use and so good that people prefer to use them.

We know for a fact that legacy systems tend to be stable but they can be slow-moving, sometimes offline during scheduled system maintenance and unable to cope with technology evolution. This results in services being non-responsive when the backend is busy with applying changes or any massive process is running. What happens, in the end, is that ICT gets questioned about service non-availability, residents are not happy, etc. So how do we solve these problems??

We engineers always think from the dimension of unraveling the data — make it securely open, it’s reusability, the approach taken, catering to similar needs, providing seamless user experience, and make it available for consumption purposes so the services can benefit from it. This allows us to get things going quickly and help us to think in the direction of problem-solving. If the approach becomes sneaky or manipulative then the goal is never achieved. In other words, try to build something which benefits people who use it.

This quote perfectly indicates our scenario.

“Nothing happens unless first a dream ” — Carl Sandburg.

So what is our goal and dream? — Have a series of Platform APIs used at their best!!

We began our journey with APIs to achieve the above goals. We brought the concept of Platform APIs and Service APIs. Identified the benefits for each of the categories along with defining the criteria for each one. Reusability, consistency, quicker access, meeting the user, and data needs (the most important foundation for our approach), and the development of services were the key areas we were looking forward to achieving. The journey was not straightforward.We had to ensure that we are conveying our vision to our colleagues in an understandable, non-technical manner to ensure a collaborative approach. At the same time, we realized there is a mammoth task of understanding the domain model, what needs to be corrected while working with data that is dispersed with no relation whatsoever in the first place. We have got challenges in every direction. (sigh!!)

My point of action was getting the team trained on the latest technologies to build reliable Rest APIs which we all took advantage of and accepted as a challenge to excel. To my surprise, the team smashed it beyond my expectations. By this time we had a suitable platform in place for deploying our APIs and making it securely available to public consumption. We now continue to grow and improve our criteria for a successful framework.

We also have been challenged many times to think about front end services that would use our API. Our answer to this is – if the data is currently being used for a certain internal process why not make it more available? Why don’t we divide the domain into reusable components, which we know are likely to be the foundation to future services? For example, we know that building an application for residents to check their rent balance means that we need to retrieve “people data” (names), transactions, addresses, etc.Those sub domains can be exposed via individual Platform APIs that are then used for any other service that needs the same data. One of the questions we often get is why aren’t our APIs driven by specific user needs related to a front end/service? Because then the APIs would be what we call “Service APIs” — built according to specific needs around one service, which will prompt for duplication in APIs as they will be all very similar due to consuming the same data domains. For quick wins, these service APIs could look very attractive but in the long run, will they be useful? To how many services? Are we duplicating effort? Are we making more room for error? — food for thought!! Following HackIT’s principle of “Fail fast”, we have learned that quick wins often mean more work and they are not viable future solutions.

We strongly feel the data needs should drive API first strategy and not the frontend needs as they tend to change a lot in a given period. These data-driven APIs will shine when we have connected the cloud-based ecosystem using these APIs to their full potential, so why would we make them specific to a service?!! I understand there will be scenarios where we have to develop quick APIs but it shouldn’t always be the case!! However, there is another side of the coin as well. We have data sources that are ‘x’ years old and have no basic database principles applied whatsoever. Building APIs with these data sources as the main entities is a huge challenge because there are no strong principles for any data taxonomy for us to follow.

We have currently partnered with Madetech & AWS Solution Architect -Manish,to achieve the goal of producing an API that will bring resident information from several data sources, achieving a consolidated view.Our aim is not to create a new data source like Citizen Index, it is to provide a consolidated data view in real-time as retrieved by the APIs. This approach will also allow the data to be used for real-time analysis and visualization.

What are we trying to achieve

  1. All services consume Platform APIs, rather than directly speaking to underlying data sources, and thus centralizing the access to data.
  2. Build data pipelines, to make existing data more available from on-premise data sources such as Qlik data etc.
  3. Getting the data normalized and in a sensible format will be a dream to achieve as some of the data sources have no degree of relevant normalization.
  4. Ensure open data standards are followed.
  5. Have a catalogue of all cloud-based platform APIs in the API Hub. The API Hub should be one-stop for all APIs with the required authorization process in place.
  6. Understanding our core objects well and defining data taxonomy around it and further data models to be built and stimulating innovation via our open APIs.
  7. Exposing APIs by establishing data contracts as a step forward.
  8. Engaging different services to understand the vision.
  9. Secure our APIs in a way that does not open possibilities for misuse.

Things to keep in mind or something to explore in the future

  1. How do we keep in mind API first when designing the relevant data pipeline.
  2. As an organisation we have both structured and unstructured data available. I’ve been reading about how a data lake approach enables some organisations to use the data at its best and enables us to build complex business processes powered with advanced analytics. We have several case studies available which could benefit us to learn from. But we want to learn more!
+ posts

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.