DevOps Practices Weeknotes w/c 23/09/19

Picture of a truck and some pipes
Pipeline coming soon…

This week there were two main developments on the DevOps Practices work: 

  1. we kicked off the planning for the deployment pipeline, which will test our containerisation hypothesis
  2. we have continued our engagement with wider teams, especially infrastructure and security on our cloud and containerisation work.

Deployment Pipeline

Last week we got agreement for our general approach and had started to pull together a list of high level needs (for more detail: https://blogs.hackney.gov.uk/hackit/devops-practices-w-c-09-02-19). This week we sat down with the team to try and turn this list into our first sprint. Although we set aside 2 and a half hours for this, we did not get to the stage of having a full planned sprint 1 for our Deployment Pipeline.

However we did manage a lot. We first started by prioritising the list of needs. We went through them all and prioritised the top 10 to discuss further. This resulted in some really good back and forth discussions about the value that we would deliver such as should we be focused on optimising speed or getting other testing right. In line with our general focus on quality over speed we focused on further testing.  

We then built out our list of needs into full stories focusing on the impact of meeting these needs. Our final list of 11 (one got split into two) user stories was: 

  1. As a developer, I want to automate deployment so that I can deploy a set of code changes quickly.
  2. As a developer, I want to prevent secrets from being distributed in the open so that our services remain secure.
  3. As a developer, I want to know that no vulnerabilities through code changes have been introduced so that our services remain secure.
  4. As a developer I want automated accessibility testing so that are services are usable by all our users.
  5. As a developer, I want to know that no vulnerabilities from 3rd party libraries have been introduced so that our services remain secure.
  6. As a technical architect, I want reusable components that I can utilise to build a pipeline for my project quickly so that I can ensure code is delivered to the standards expected by HackIT
  7. As a developer I want automated load testing so that I am confident that the service can cope with peak demand
  8. As a service owner, I want visibility of quality measures (build metrics, automated test reports) so that I am reassured the automation is working
  9. As a technical architect want to know that code being written conforms to best standards and is follows the development style at HackIT so that it is supportable, scalable and consistent.
  10. As a technical architect, I want a consistent naming convention for all objects (code repositories, infrastructure, libraries) so that people can find their way around and understand the architecture of the code
  11. As a developer, I want a way for our technical documentation to be generated automatically from our code base so that I don’t have to remember to update documentation every time a change is made

At this point they are quite developer heavy. I think this makes sense in the current context of separating our Terraform infrastructure from our applications but it is something that we are aware of and will be keen to redress the balance as we go further forward. Let us know if you have any feedback on these as they are still open to being refined.

Once we had done this we agreed our definitions of done and acceptance criteria for five of these stories.

Continued Engagement

We have continued to engage with a number of teams to feed into this alpha. People from the infrastructure team are going to be participating in the pipeline work. This upcoming week I will be working through the backlog with them to feed into our discussions on Friday. We have continued to engage regarding cloud capacity and these discussions have helped to refine our thinking on cloud procurement. 

We are in a really good place to square the circle between keeping the infrastructure team and their skills a part of whatever cloud we procure, while acknowledging that not everything that they are responsible for will be cloud hosted in the foreseeable future. 

We are also working alongside the security team to feed into the deployment pipeline and also to see if we make sure we can embed a culture of security being baked into development. 

Next Steps

The team are meeting on Thursday to break down our prioritised stories (with acceptance criteria) into actionable tasks. We will also continue to explore our strategy around procuring a primary cloud supplier. 

As always any feedback is much appreciated. 

DevOps Practices – w/c 09/09/19

The deployment pipeline that we will test during this prototype phase

This week there have been two focuses. The main one is planning how we will test our containerisation hypothesis with two teams who are working to replatform their products. Our other focus has been to build on the work we did last week in order to start to think about what framework we might use for evaluating a primary cloud supplier. 

Deployment Pipelines

This week we have been in discussions with the Manage a Tenancy team and the My Rent Account team who are both about to start replatforming their products. We are going to work with the HackIT developers in these teams to develop a deployment pipeline for the work that they are carrying out. 

This deployment pipeline will build on the work that has already been done by the API-Factory but with a greater focus around containerisation. We are going to use the replatforming of Manage a Tenancy to test that the approach works and then test if it is re-usable with My Rent Account. 

This work has built on the discussion that we have had over the last few weeks. From those discussions JJ has put together a proposed structure for the pipeline, which we have been validating with stakeholders (pictured). For the time being we are proposing separating our Terraform infrastructure from our applications but this is something that we may consider bringing together as we develop our skills.  

We have started to pull together a list of needs for various roles from a deployment pipeline and we are getting people from across HackIT to feed into it. This list of needs will inform our approach over the coming needs as the rubber hits the road and we start to test our hypothesis on containerisation in earnest. 

Evaluating a Primary Cloud Supplier

We are building on the next steps from the workshop the week before last to start to think about how we will identify the right primary cloud supplier for HackIT. To help with this Ciaran has pulled together a list of questions for us to answer focusing on (in no particular order): connectivity, skills, costs, technology, and other general considerations. 

Once we have started to answer those questions we will be in a position to know where we need further investigations of user needs and technology and where we are okay. I also spoke to Rob about the work that was done on ‘Next generation productivity’, this is a procurement approach that we are looking to emulate for picking a primary cloud supplier. 

Next steps

This week we will be breaking down the tasks that need to be carried out to build the pipeline for delivery. We will be holding further discussions around our cloud approach and we will be speaking to Applications teams about why containerisation is relevant to them and the work they do.  

DevOps Practices: A (primary) cloud on the horizon w/c 02.09.19

Self-portraits from our cloud workshop, used to indicate a preference of approach.
The picture is of the avatars people drew of themselves to place on their desired cloud approach. The quality of the discussions were far better than the self-portraits 😉

For these weeknotes I am going to write about the workshop we held last Wednesday afternoon to discuss ‘Our approach to cloud – next steps’. Although JJ has continued on work around our pipeline, which I will write about in more detail at a later date, this week the focus is cloud. We have started exploring the second hypothesis of this alpha, namely: 

“Picking a primary cloud supplier will save us time and money”.

This workshop involved the management from a range of teams and disciplines from across HackIT and was a chance for us to have a conversation about our views on our cloud approach in an open and constructive way. 

At the start of the workshop we set out to:

  • have a shared understanding of the terms/ideas we use around cloud
  • understand where we have differences of opinion, and where we have consensus
  • have explored our assumptions and hopes and fears
  • have an idea of the key strategic decisions we must make, and where we can safely test out ideas

Where are people at?

We wanted to start with a quick check to test what everyone’s current take was re: cloud. To do this we all did a self-doodle on a Post-It and used those to mark where we each were along a range of potential approaches, covering private cloud, single public cloud, hybrid cloud and multi-cloud. We quickly discarded private cloud (as no one picked that option) and focused on the other three. 

We asked people to go through why they placed their views in each bucket. One of the things that emerged through this discussion is that the things that we valued were pretty consistent between people, even where we’d placed our Post-Its in different places. The common considerations that were brought up were: users’ and services’ needs, flexibility, simplicity / ease, value for money, and innovation.  

Although people initially differed on which bucket they put their views into, through discussion it emerged that there was a consensus around a single approach that we felt best satisfied the considerations outlined above. This was for us to adopt a ‘Primary cloud but with exceptions’. This to some people was called multi-cloud and some people single public cloud but everyone broadly agreed this should be our approach.

This was really encouraging for us as it felt as though the hypothesis we are testing really tallies with our broad approach. This is something you always hope to be the case but given that the hypothesis had been suggested by an external agency it was really encouraging to see it chimed with our experiences. 

What problems do we need to solve?

The two key areas that came out of the main bulk of the discussion were portability and the exception process. 

We had a useful discussion of what we actually meant by portability. Portability is how easy it is to move our applications from one system to another. It can be more or less instant. At one end of the spectrum you could try and set your systems up to be instantly portable, so they could move from one cloud supplier to another at the click of a button. This would have the advantage of being easily able to take advantage of price competition between different suppliers. 

In our discussion we challenged whether our scale and sophistication would make the investment in doing this worthwhile. Our overall view was that we needed to have sufficient ease of moving between clouds to avoid expensive vendor lock-in, but that we didn’t feel that we need to be able to dynamically move from cloud to cloud to take advantage of micro-level price changes between providers. We spoke instead about being able to move something over a timeframe of 6 months to a year, which would be much more manageable from a technical point of view and not introduce huge upfront development and implementation costs.

The other area that we discussed was how we would agree for applications to use a cloud provider other than our primary cloud supplier. We identified a number of factors that we want to consider when determining that something should be exempt from our Primary Cloud Solution, these might include things like: Cost profile, meeting user needs, opportunity for innovation, and flexibility.

There will also potentially be some cases where we would choose to accept less portability in order to benefit from specific functionality (i.e. using proprietary cloud or PaaS features), which is something that will have to be considered when exceptions are agreed. This will have to be something that we further consider in line with the initial steer in this workshop. 

What do we still need to know?

The areas that were the highest priority for us to follow up on were:

  • Deciding the criteria we will use to determine the primary cloud supplier
  • Thinking about our ‘minimum viable’ portability criteria
  • Agreeing some exception criteria for when it’s ok for teams not to use the primary option
  • Determining what capacity we would need to procure from our primary cloud supplier and considering our route to market (our default is the Digital Marketplace / G Cloud, but we need to make sure that’s the best route to take)

We will look into these four areas in more detail over the coming weeks.

DevOps Practices – w/c 26.08.19

This week we have continued with exploring the hypothesis: “Adopting a consistent approach to containerisation will make it easier and more efficient to develop, test and (re)deploy services”. This had led us into two related areas PaaS (Platform as a Service ) and our deployment path. 

Deployment Path

As part of this work JJ started by looking into what has been done already on API-Factory regarding their deployment pipeline. By pulling out some elements of this we were able to have a starting point for our workshop on Wednesday. We had people from teams across HackIT in attendance and we asked everyone to add questions to the pipeline in addition to the ones that JJ had pulled out beforehand. 

This made us realise that actually we were looking at the question at too great a level of granularity and the discussions were actually more beneficial when we were talking about what we needed from a deployment pipeline in general. From this we were able to identify four general things that we felt a more structured deployment pipeline would need to incorporate:  

  1. A focus on improving quality over quick deployment
  2. Only having the environments we need and keeping them to a minimum
  3. As much room for automated testing as possible
  4. A greater emphasis on collecting measurements (which links the the work we’ve done to baseline our measures for testing the hypothesis).

Although this work is not strictly part of the ‘consistent approach to containerisation’ that we are testing in this part of the alpha it is very closely related and will help to lay the groundwork for the other part of this alpha, namely: “Picking a primary cloud supplier will save time & money”.

GOV.UK PaaS

The groundwork for testing our next hypothesis was also prepared by starting to think about Platform as a Service (PaaS), specifically GOV.UK’s PaaS offer. They offer cloud infrastructure (via AWS) alongside a number of automated tools that allow for faster deployments. On Thursday we arranged for the GOV.UK PaaS team to come into Hackney to give us a Show and Tell and run a deep dive into their product. 

It was a really good chance for us to have a bit more of a think about of the benefits of PaaS in general and GOV.UK PaaS in particular. It was really encouraging to see the number of people and the number of people from different teams who attended. If you are interested but were unable to attend here are the slides: https://drive.google.com/file/d/0B72cBprw5rzGTUFVUWowU1FrTVh3UGpjeVRmQXpSUE50Sndr/view?usp=sharing.

This was followed the GOV.UK PaaS team running a deep dive for people in HackIT to try the service in anger. This seemed to go really well, there were people from the development team and infrastructure team working to use PaaS to make deployments. 

Next steps

Next week we will be kicking off the 2nd part of this alpha in earnest. We will be starting to test the other hypothesis mentioned above. On Wednesday we are holding a session on ‘Our approach to cloud – next steps’ with team leads from across HackIT. 

We have also started thinking about which team we might want to test our containerisation processes. We are engaging with the re-platforming My Rent Account team to see if we might be able to test our hypothesis in a way that advances their work as well.

DevOps Practices – w.c. 19.08.19

The last few weeks have seen us really kick off the DevOps practices alpha in earnest. The three areas where we have focused on are:
– engaging with teams regarding their existing practices,
– thinking about measures of success, and
– planning for our next steps.

Since we kicked off the alpha last week we have been focusing on this hypothesis:
“Adopting a consistent approach to containerisation will make it easier and more efficient to develop, test and (re)deploy services”.

Existing practices

Last week we ran a workshop with people from the API-Factory, infrastructure, security, and applications teams, focused on the current practices of the API-Factory team. Although sadly I had to miss it, this was a great chance for teams that do not normally discuss the details of their work together to start having these kinds of conversations.

Although these conversations aren’t a direct, measurable output of the DevOps practices work they are what creating a DevOps culture really is all about. Cate described it as one of the best meetings she’s been in since joining Hackney!

We also had an interesting chat with colleagues in Data & Insight team who are using containers but in a very different way to the API-Factory team. One of the things that we are mindful of is to test an approach that is flexible enough for different teams with different needs but still creates consistency and predictability where it can.

Measures of success

JJ, who is with us from Digi2al working on this DevOps practices alpha phase, has spent a lot of time in the last couple of weeks thinking about how we can measure the success of whatever we try. This will mean that rather than having a hunch that our new approach is better we can actually point to some evidence.

To start the process of identifying measures of success we looked at ‘Accelerate: The Science of Lean Software and Devops’ by Nicole Forsgren and Jez Humble. We took their four key measures as our starting point: Lead Time, Deployment Frequency, Mean Time to Restore (MTTR) and Change Fail Percentage.

We have started engaging with various teams across HackIT, as well as suppliers, to see if we can get baseline data for these before we start testing anything. One of the things that we have identified is that we are not currently collecting data on some of the measures above. This gives us an opportunity in our test to start collecting these useful insights, as well as recommendations to make regardless of the success of our test.

Next steps

This week we have a busy week on the DevOps front.

On Wednesday we are holding a workshop with teams from across HackIT to design the deployment path we will use to test our hypothesis. This will build on the existing deployment path that API-Factory is using but take into account that it might not be the best fit for everything.

On Thursday we have the GOV.UK PaaS team coming in to give HackIT a show and tell on their product and then do a deep dive after with people from across HackIT’s development and infrastructure teams.