Rearchitecting the back-end: Dan has completed the rearchitecting of the backend for our staging and test environments. We are almost ready to update this for production. This will mean that we can go from having to run builds one at a time to a situation where we can run up to 5 concurrent builds. Omar has been working closely with Dan on this and getting his teeth into Terraform.
Navigation: Luca has completed the implementation of the website’s navigation, which should make it much easier when users are trying to move between different pages on the website.
Preparation for the end of the project: the team have started drawing up documentation and runbooks for supporting the website after Dan and Luca leave.
Challenges this week
The team’s time: it has continued to be difficult to get as much of the team’s time as we would like. This is partly because we have been unlucky and people have been off sick, however it is also a problem of people having multiple projects and existing commitments. We have tried to address some of these issues by coordinating with other managers and projects. We will see next week if it is effective.
The steep learning curve: members of the team have said that they find the learning curve really steep and that they aren’t sure how much of it they fully understand. This is expected, given that most of them were completely new to the principles of software development and DevOps when they started. By having the opportunity to practically try things out the team will hopefully be able to see how much they’ve learnt.
Next week is the final week of this phase of the DevOps programme. It will involve:
An end of phase show and tell
A hackathon to test how much the team have learnt
A phased handover of the DevOps knowledge that the HackIT teams need on the website
Continuing with refactoring the front end and providing learning opportunities for the team
This third phase of the DevOps programme is focused on strengthening support on the Hackney website. We are using DevOps tools and approaches to make the website more reliable and sustainable. It is also testing one of the hypotheses from our initial discovery:
“Developers working alongside people with ‘ops’ (i.e. infrastructure / applications support) skills will increase velocity, resilience, security and stability”
This project will run between 25th November and the 20th December.
Previously we carried out a discovery; and launched an initial alpha focused on cloud procurement and a DevOps pipeline approach.
The team is made up of people from the applications, infrastructure and delivery teams as well as a DevOps engineer, Dan, and a Front-End developer, Luca, from Digi2al (who we have been working with on the DevOps programme).
This is the first time we have applications people in our team, and it is a great opportunity to see how closely apps skills align with DevOps skills. Equally, for some of the team, this is the first time that they have worked on a development project using agile methodologies. As a result of all this, one of our key focuses is learning and development.
For the first time during the DevOps programme we have a team working on this almost full time, which is great as it allows us to more easily maintain focus and build momentum.
What we are doing?
This week we have been getting up to speed with the underlying technology of the website. Luca and Dan have been diving in at the deep end and quickly getting up to speed with the codebase. The rest of the team has been getting familiar with the technology and terms; and shadowing the more experienced members of the team.
We have to strike a balance in this project between outputs and knowledge transfer. One week into this four week project, the HackIT team is primarily learning. While the Digi2al duo has been identifying areas where the team can up-skill, providing introductory readings and running tech workshops.
There is an existing backlog from the project that delivered the website, which included a combination of bugs and small features. Product Owner on this project, Susan, has been looking through the backlog with Dan and Luca to identify tasks that add the most value and also make the website more maintainable and supportable in the future.
This has resulted in an early focus on two areas: search and WordPress. Luca’s refactoring and simplifying of search will greatly improve the quality of search on the website and make future improvements easier.
Secondly, Dan is working with Omar from Infrastructure to make the WordPress backend more reliable. It has a tendency to fall over during simultaneous deployments, requiring a manual restart. By restructuring the AWS architecture upon which WordPress sits, we should be able to save our content publishers time and ensure the site is updated readily for our residents.
We will agree on areas that our HackIT team members can look at over the next three weeks that will add value and broaden their learning. They are also continuing to shadow the Digi2al members of the team.
Luca and Dan are working through the search and WordPress re-architecting, respectively, in addition to picking up minor bugs and code-reviewing other work that is taking place on the website. They are also running tech workshops and supporting the learning of the rest of the team by suggesting material to learn and exercises to try.
This week there were two main developments on the DevOps Practices work:
we kicked off the planning for the deployment pipeline, which will test our containerisation hypothesis
we have continued our engagement with wider teams, especially infrastructure and security on our cloud and containerisation work.
Last week we got agreement for our general approach and had started to pull together a list of high level needs (for more detail: https://blogs.hackney.gov.uk/hackit/devops-practices-w-c-09-02-19). This week we sat down with the team to try and turn this list into our first sprint. Although we set aside 2 and a half hours for this, we did not get to the stage of having a full planned sprint 1 for our Deployment Pipeline.
However we did manage a lot. We first started by prioritising the list of needs. We went through them all and prioritised the top 10 to discuss further. This resulted in some really good back and forth discussions about the value that we would deliver such as should we be focused on optimising speed or getting other testing right. In line with our general focus on quality over speed we focused on further testing.
We then built out our list of needs into full stories focusing on the impact of meeting these needs. Our final list of 11 (one got split into two) user stories was:
As a developer, I want to automate deployment so that I can deploy a set of code changes quickly.
As a developer, I want to prevent secrets from being distributed in the open so that our services remain secure.
As a developer, I want to know that no vulnerabilities through code changes have been introduced so that our services remain secure.
As a developer I want automated accessibility testing so that are services are usable by all our users.
As a developer, I want to know that no vulnerabilities from 3rd party libraries have been introduced so that our services remain secure.
As a technical architect, I want reusable components that I can utilise to build a pipeline for my project quickly so that I can ensure code is delivered to the standards expected by HackIT
As a developer I want automated load testing so that I am confident that the service can cope with peak demand
As a service owner, I want visibility of quality measures (build metrics, automated test reports) so that I am reassured the automation is working
As a technical architect want to know that code being written conforms to best standards and is follows the development style at HackIT so that it is supportable, scalable and consistent.
As a technical architect, I want a consistent naming convention for all objects (code repositories, infrastructure, libraries) so that people can find their way around and understand the architecture of the code
As a developer, I want a way for our technical documentation to be generated automatically from our code base so that I don’t have to remember to update documentation every time a change is made
At this point they are quite developer heavy. I think this makes sense in the current context of separating our Terraform infrastructure from our applications but it is something that we are aware of and will be keen to redress the balance as we go further forward. Let us know if you have any feedback on these as they are still open to being refined.
Once we had done this we agreed our definitions of done and acceptance criteria for five of these stories.
We have continued to engage with a number of teams to feed into this alpha. People from the infrastructure team are going to be participating in the pipeline work. This upcoming week I will be working through the backlog with them to feed into our discussions on Friday. We have continued to engage regarding cloud capacity and these discussions have helped to refine our thinking on cloud procurement.
We are in a really good place to square the circle between keeping the infrastructure team and their skills a part of whatever cloud we procure, while acknowledging that not everything that they are responsible for will be cloud hosted in the foreseeable future.
We are also working alongside the security team to feed into the deployment pipeline and also to see if we make sure we can embed a culture of security being baked into development.
The team are meeting on Thursday to break down our prioritised stories (with acceptance criteria) into actionable tasks. We will also continue to explore our strategy around procuring a primary cloud supplier.
This week there have been two focuses. The main one is planning how we will test our containerisation hypothesis with two teams who are working to replatform their products. Our other focus has been to build on the work we did last week in order to start to think about what framework we might use for evaluating a primary cloud supplier.
This week we have been in discussions with the Manage a Tenancy team and the My Rent Account team who are both about to start replatforming their products. We are going to work with the HackIT developers in these teams to develop a deployment pipeline for the work that they are carrying out.
This deployment pipeline will build on the work that has already been done by the API-Factory but with a greater focus around containerisation. We are going to use the replatforming of Manage a Tenancy to test that the approach works and then test if it is re-usable with My Rent Account.
This work has built on the discussion that we have had over the last few weeks. From those discussions JJ has put together a proposed structure for the pipeline, which we have been validating with stakeholders (pictured). For the time being we are proposing separating our Terraform infrastructure from our applications but this is something that we may consider bringing together as we develop our skills.
We have started to pull together a list of needs for various roles from a deployment pipeline and we are getting people from across HackIT to feed into it. This list of needs will inform our approach over the coming needs as the rubber hits the road and we start to test our hypothesis on containerisation in earnest.
Evaluating a Primary Cloud Supplier
We are building on the next steps from the workshop the week before last to start to think about how we will identify the right primary cloud supplier for HackIT. To help with this Ciaran has pulled together a list of questions for us to answer focusing on (in no particular order): connectivity, skills, costs, technology, and other general considerations.
Once we have started to answer those questions we will be in a position to know where we need further investigations of user needs and technology and where we are okay. I also spoke to Rob about the work that was done on ‘Next generation productivity’, this is a procurement approach that we are looking to emulate for picking a primary cloud supplier.
This week we will be breaking down the tasks that need to be carried out to build the pipeline for delivery. We will be holding further discussions around our cloud approach and we will be speaking to Applications teams about why containerisation is relevant to them and the work they do.
For these weeknotes I am going to write about the workshop we held last Wednesday afternoon to discuss ‘Our approach to cloud – next steps’. Although JJ has continued on work around our pipeline, which I will write about in more detail at a later date, this week the focus is cloud. We have started exploring the second hypothesis of this alpha, namely:
“Picking a primary cloud supplier will save us time and money”.
This workshop involved the management from a range of teams and disciplines from across HackIT and was a chance for us to have a conversation about our views on our cloud approach in an open and constructive way.
At the start of the workshop we set out to:
have a shared understanding of the terms/ideas we use around cloud
understand where we have differences of opinion, and where we have consensus
have explored our assumptions and hopes and fears
have an idea of the key strategic decisions we must make, and where we can safely test out ideas
Where are people at?
We wanted to start with a quick check to test what everyone’s current take was re: cloud. To do this we all did a self-doodle on a Post-It and used those to mark where we each were along a range of potential approaches, covering private cloud, single public cloud, hybrid cloud and multi-cloud. We quickly discarded private cloud (as no one picked that option) and focused on the other three.
We asked people to go through why they placed their views in each bucket. One of the things that emerged through this discussion is that the things that we valued were pretty consistent between people, even where we’d placed our Post-Its in different places. The common considerations that were brought up were: users’ and services’ needs, flexibility, simplicity / ease, value for money, and innovation.
Although people initially differed on which bucket they put their views into, through discussion it emerged that there was a consensus around a single approach that we felt best satisfied the considerations outlined above. This was for us to adopt a ‘Primary cloud but with exceptions’. This to some people was called multi-cloud and some people single public cloud but everyone broadly agreed this should be our approach.
This was really encouraging for us as it felt as though the hypothesis we are testing really tallies with our broad approach. This is something you always hope to be the case but given that the hypothesis had been suggested by an external agency it was really encouraging to see it chimed with our experiences.
What problems do we need to solve?
The two key areas that came out of the main bulk of the discussion were portability and the exception process.
We had a useful discussion of what we actually meant by portability. Portability is how easy it is to move our applications from one system to another. It can be more or less instant. At one end of the spectrum you could try and set your systems up to be instantly portable, so they could move from one cloud supplier to another at the click of a button. This would have the advantage of being easily able to take advantage of price competition between different suppliers.
In our discussion we challenged whether our scale and sophistication would make the investment in doing this worthwhile. Our overall view was that we needed to have sufficient ease of moving between clouds to avoid expensive vendor lock-in, but that we didn’t feel that we need to be able to dynamically move from cloud to cloud to take advantage of micro-level price changes between providers. We spoke instead about being able to move something over a timeframe of 6 months to a year, which would be much more manageable from a technical point of view and not introduce huge upfront development and implementation costs.
The other area that we discussed was how we would agree for applications to use a cloud provider other than our primary cloud supplier. We identified a number of factors that we want to consider when determining that something should be exempt from our Primary Cloud Solution, these might include things like: Cost profile, meeting user needs, opportunity for innovation, and flexibility.
There will also potentially be some cases where we would choose to accept less portability in order to benefit from specific functionality (i.e. using proprietary cloud or PaaS features), which is something that will have to be considered when exceptions are agreed. This will have to be something that we further consider in line with the initial steer in this workshop.
What do we still need to know?
The areas that were the highest priority for us to follow up on were:
Deciding the criteria we will use to determine the primary cloud supplier
Thinking about our ‘minimum viable’ portability criteria
Agreeing some exception criteria for when it’s ok for teams not to use the primary option
Determining what capacity we would need to procure from our primary cloud supplier and considering our route to market (our default is the Digital Marketplace / G Cloud, but we need to make sure that’s the best route to take)
We will look into these four areas in more detail over the coming weeks.