Housing data in the cloud: Weeknote, w/c 02.12.2019

Housing data in the cloud. It does what it says on the tin. 

We’ve got data, lots of data. 

This data lives in an old house which has had many occupants. Each occupant has added data, moved that data around, put it in different rooms, called it different things and used it to prop up the fabric of the building. The old house is weighed down with data, nobody can find what they are looking for and removing data risks a structural collapse. 

We are changing this. 

That’s a bold statement, let me put some context around it. We are taking our first experimental steps to see if we can move a tiny piece of housing data into a cloud platform. 

Back in October we did a week long discovery to identify a data candidate for prototype and to think about what success might look like. Thankfully, we’re not starting from scratch. Other fine minds have looked at our old, overstuffed house – we’ve valiantly attempted renovation and even extension. Colleagues from MadeTech and pulled all this learning together and made a set of recommendations, which we are testing in the prototype we are building over the next few weeks.  

Introducing the dream team

The avengers getting ready for battle.
This is what we look like in my imagination. I’m Natasha Romanoff obviously 🙂

We are working with support from AWS and MadeTech along with our award winning team of Hackney developers. With got expertise from our data and insight team and three technical architects (at my last count). I’m terrified, the team is absolutely buzzing. We’re finally staring into the eyes of our nemesis – let the battle commence. 

We are working in 5 day sprints. I love the drive of the team. They want to work hard and fast. We’ve covering new ground everyday. The team are absorbing new ideas, skills and ways of working like a sponge. We don’t all see things in the same way, but the team are embracing this too. 

This week, we’ve identified our use case. Right from the outset we want to demonstrate how our work can bring tangible value to the Council services that rely on this data. We’ve got to keep this grounded in business need and ultimately the needs of Hackney residents. We’ve also set our cloud platform in AWS this week. Next up: is deciding what database we need for this prototype. There is A LOT of debate about this in the team. I’m expecting a few fireworks. We’ve got a spike early next week to try and crack this.

Property and Asset Management (PAM) data project weeknotes w/c 05.08.19

This week we focused on two main areas: continuing to better understand the scale of the issues with property and asset management data and also starting to think about how we might use the LLPG to address some of these issues (awful pun not-intended).

In order to make sure that we were aligned with other work that had been done in this area we spoke to Ian Marriot and team (including +Clayton Daniel as our demo-master) who explained the thinking that has guided their efforts on the Property Index. This tool compares data on Universal Housing (UH) and Codeman (the asset management system PAM uses currently).

This gave us a helpful overview of the scale of some of the data quality issues and also a view of some of the things they’ve considered to improve them. There were some headline numbers (more dwellings on UH than Codeman), which had reasonable explanations (UH includes additional dwelling types like temporary accommodation), but others that revealed more of a problem (blocks or sub-blocks marked as disabled on Codeman but not marked on UH).

In addition to this we have started to put together a list of all the teams / services in Hackney that make direct use of PAM data and which data items they are interested in. This will help us to start to build a picture of which data items are the mainstream ones that we need to be accommodating in our solution.

It has also helped us hone our thinking on where we draw the line in terms of which data elements to add into the LLPG to ensure we can maintain data quality (of both our address data itself which is super important to protect, as well of PAM data itself). We’ll need to come up with some clear criteria to test with PAM colleagues in the next sprint.

This week we have started to look in more detail at the LLPG itself. This has prompted a discussion about how we will test the hierarchies in the LLPG. We have discussed the merits of cross referencing vs parent child relationships as a method of creating our hierarchies in the LLPG.

We chose to test cross referencing rather than parent child as parent child itself would not allow us to have enough layers in the hierarchy due to limitations of the LLPG structure and software we use to maintain it. We considered a hybrid of parent child and cross-refs, but in our discussions parent child did not seem to have any significant benefit.

Before the end of this sprint we’ll be talking through the pros and cons of this approach with the Dev team, Apps Management and other Data & Insight reps to get some healthy challenge and ensure that our proposed approach won’t cause unnecessary work for them down the line in surfacing PAM data for applications or reporting.

Property and Asset Management (PAM) Data Project – Weeknotes 1 w/c 15.07.2019

This is the first set of weeknotes for the PAM Data project and yes, I don’t yet have a better name for it. I tried to verbify by calling it ‘Get accurate PAM data’ but that still doesn’t feel quite right. Answers on a postcard please.

This phase of the project came about as a result of some excellent research carried out by the data and insight team earlier this year.  Through their research they came up with the recommendation that “we invest to expand the LLPG to operate as the central, trusted source of unique references to support our property data schema”. 

During this phase of the PAM Data Project (😬) we will be testing this recommendation to see if the LLPG is something that can become a ‘central trusted source’ for the majority of use cases of PAM data. This is part of a strategic move to make us less dependent on Universal Housing as a source of important data. 

The team is made up of: 

  • myself as a Delivery Manager
  • Lindsey Coulson as the product owner of the LLPG
  • Lisa Stidle working on data strategy
  • Herminia Matheiu and Yash Shah from Housing working as data analysts
  • Liz Harrison as the project sponsor

This project is being run using the scrum methodology, this is the first time that many of the team have worked in this way so I’m really interested in the feedback at retros to know how they find it.

This week we kicked off the prototype / alpha phase with a planning session. Lisa, Liz and Lindsey had worked to produce a backlog and a team board with some hypotheses that this phase should test. We were able to use this to plan our first sprint’s worth of work that we are now working our way through. We will have a show and tell on the 31st of July for those that are interested.

This sprint is focused on researching and agreeing with stakeholders a first pass at the hierarchy of data items: floor, sub block, block and so on. Further to this we are keen to build on the work that multiple teams have done on property data for various projects. If you think that any work you’ve done could feed into this please get in touch.

Half way through the GDS Data Science Accelerator programme

Back in March, Liz Harrison wrote about our successful application to the GDS’ Data Science Accelerator Programme and what we were hoping to achieve. I am now over half way through the 12 week programme, and next week I’ll be giving a show and tell to the rest of my cohort. So now seems a good time for an update.

As a quick reminder, I am developing a unified view of the Hackney property market, focusing on rental properties. I am using the Python programming language for the majority of the analysis. This will be delivered via an interactive interface for easy access to those who need it within Hackney Council. We hope that this will help answer questions about houses in multiple occupation (HMOs), rental prices and property condition.

Have I kept to the project brief?

The project brief was extensive, and I was prepared for the fact that some aspects of it would fail. So what has failed? I was interested in looking at Google Street View imagery to see if property features (age, style etc) could be extracted through image processing techniques. It was quickly established that firstly data was not as freely available as it had been and also that the work involved in building a model could probably be an Accelerator project in its own right. This is fine; we can piece together this information from other data sources.

Linking data at the property level

We already hold data that is linked at a property level using the Unique Property Reference Number (UPRN). Through this work we have started to enrich what we currently know about a property by linking in new datasets. For example Energy Performance Certificate data, which can tell us a lot about property condition and size. We will continue to develop this property dataset beyond this project, as other areas of work will benefit, such as improving our understanding of our housing estates.

Identifying unlicensed HMOs

Through parallel work with our Private Sector Housing colleagues, it was always clear that identifying unlicensed HMOs would be a valuable output from this work. Therefore building a predictive model has remained a priority, and I have received excellent guidance from the mentor group.

Topic modelling of free text data

I wanted to undertake text analysis of reports made about housing which could vary from pest control incidents to noise complaints. We hold a lot of free text data within our various council databases, and it felt like there was an opportunity to understand what residents tell us about the most. I’m hopeful that we will be able to visualise this spatially too.

Rental prices

Gaining access to property level rental pricing data has been challenging and the legality of web scraping rental listing sites can be unclear, but aggregated data is available, and so I’ll be making use of this within the model.

A recent piece of work on short term lets in Edinburgh published by the Scottish Parliament Information Centre made use of AirBnB data. This has made us think about the effect of entire properties being let out for the majority of the year on the Hackney rental market.

Developing an interactive interface

We knew from the outset that we would need a way to communicate outputs to users, and therefore a data visualisation would need to be developed. For this to be useful, then as a minimum it would need to be intuitive, interactive and also present data spatially. My mentor has been helping me build such a tool in JavaScript.

Working with others from government

For me, the real strength of the Accelerator programme is the opportunity to work with and receive advice from data scientists from across government, as well as collaborate with peers in similar roles in local government. We all seem to be facing the same challenges.

In addition to help from my colleagues within the Council, I have also received support from the Office for National Statistics and GeoPlace, who have provided access to invaluable data and expertise.

This Accelerator cohort graduates on the 4th July.

Embedding an ethical approach to underpin our data science projects

We’re lucky in Hackney – in 2018, our Data & Insight function has grown in both number and scope, and we’re one of only a few local authorities to employ a permanent data science resource. Our data scientist works closely with the rest of the team, whose overall focus is on joining and analysing the huge range of data we hold, to help services better meet the needs of our residents. The talent and skills of the team, combined with the vision of our ICT leadership, which challenges us to look at the same problems in radically different ways, offers no small opportunity.

The private sector has led the way in practically employing data science techniques, harnessing vast swathes of information on customers and their spending habits to maximise revenues. For example, we’ve seen companies like Amazon use machine learning to persuade shoppers to part with extra cash at the online checkout by showcasing related products. The public sector has lagged behind, in part because of a lack of investment in the necessary skills but also due to the longstanding central government focus on data being used primarily for retrospective reporting. This has limited our ability to use our knowledge  – about our residents and how they interact with services – more creatively. Shifting the focus to predictive analysis could help us change the way we support people in future, to help us deliver better services at lower cost.

We want to replicate the success of the private sector in leveraging the vast volumes of data we hold as an asset to improve our service provision. These include the opportunity to prevent homelessness; better targeting of resources to assess social care cases that require most urgent attention; improved customer service by channeling users to other services they may interested in as they transact with us; or tackling fraud, to name a few.

While opportunity abounds, we face a unique challenge in meeting the expectations of our residents who hold us to a much higher standard than private companies, when it comes to handling their data. Many local government data teams are starting to work on predictive models but we know system bias is a concern to the public. How can we trust the results of predictive algorithms that have been built on data which may be limited, or only reflect how a long established service has traditionally engaged with specific sections of our community?

If we are to successfully implement forward-looking predictive analytics models, we have to build trust with our citizens: to ensure that they understand our motivations, and can transparently assess how we work with data.

The approach we’re taking:

From the outset, we’ve been alert to the need to test and build an ethical approach to our data science work, which is still in its infancy.

Building on examples we’ve seen elsewhere, we’ve developed a Data Ethics Framework which nestles alongside our Privacy Impact Assessment (PIA) process in Hackney, to make sure that for every project we undertake, we’re able to clearly articulate that we’re using a proportionate amount of data to enable us to draw robust conclusions.

At each stage of our 5 step project cycle, we stop to consider:

– Is our data usage proportionate and sufficient to meet user need?

– Are we using data legally, securely and anonymously?

– Are we using data in a robust, transparent and accountable way?

– Are we embedding and communicating responsible data usage?

One of the most important checks and balances on our work will come from an increasing focus on how we embed responsible use of our findings. Applying data science methods to our large volumes of data offers huge opportunities to provide a positive impact for residents, but we know there are risks if outputs are misinterpreted. We’re trying to mitigate against this by developing our team’s skills to communicate our findings in the simplest way possible so that local government officers don’t need to become expert data analysts to responsibly make use of this work. Democratising access to data and providing everyone with the skills or tools they need to make sense of information has to be the right approach.

We’re taking small steps every day to improve our skills and maximise the benefit of the opportunity we have in Hackney. We’re learning from others – notably the gov.uk Data Ethics Workbook which inspired our approach and trying to embed in a simple and proportionate way. The key for us is balance; we’ve tried to streamline this into a simple process with sufficient rigour to build confidence in the ethics of our work without unnecessarily slowing down our ability to experiment. We’re keen to open out the conversation and hear from other public sector organisations who are beginning to unpick this sticky issue.

We also recognise that to truly build trust with our citizens on how and when we use their data, we need to openly engage with people. We’re thinking about how best to start a conversation with residents so we can hear their concerns, discuss the risks and opportunities and agree a way forward, together.