Half way through the GDS Data Science Accelerator programme

Back in March, Liz Harrison wrote about our successful application to the GDS’ Data Science Accelerator Programme and what we were hoping to achieve. I am now over half way through the 12 week programme, and next week I’ll be giving a show and tell to the rest of my cohort. So now seems a good time for an update.

As a quick reminder, I am developing a unified view of the Hackney property market, focusing on rental properties. I am using the Python programming language for the majority of the analysis. This will be delivered via an interactive interface for easy access to those who need it within Hackney Council. We hope that this will help answer questions about houses in multiple occupation (HMOs), rental prices and property condition.

Have I kept to the project brief?

The project brief was extensive, and I was prepared for the fact that some aspects of it would fail. So what has failed? I was interested in looking at Google Street View imagery to see if property features (age, style etc) could be extracted through image processing techniques. It was quickly established that firstly data was not as freely available as it had been and also that the work involved in building a model could probably be an Accelerator project in its own right. This is fine; we can piece together this information from other data sources.

Linking data at the property level

We already hold data that is linked at a property level using the Unique Property Reference Number (UPRN). Through this work we have started to enrich what we currently know about a property by linking in new datasets. For example Energy Performance Certificate data, which can tell us a lot about property condition and size. We will continue to develop this property dataset beyond this project, as other areas of work will benefit, such as improving our understanding of our housing estates.

Identifying unlicensed HMOs

Through parallel work with our Private Sector Housing colleagues, it was always clear that identifying unlicensed HMOs would be a valuable output from this work. Therefore building a predictive model has remained a priority, and I have received excellent guidance from the mentor group.

Topic modelling of free text data

I wanted to undertake text analysis of reports made about housing which could vary from pest control incidents to noise complaints. We hold a lot of free text data within our various council databases, and it felt like there was an opportunity to understand what residents tell us about the most. I’m hopeful that we will be able to visualise this spatially too.

Rental prices

Gaining access to property level rental pricing data has been challenging and the legality of web scraping rental listing sites can be unclear, but aggregated data is available, and so I’ll be making use of this within the model.

A recent piece of work on short term lets in Edinburgh published by the Scottish Parliament Information Centre made use of AirBnB data. This has made us think about the effect of entire properties being let out for the majority of the year on the Hackney rental market.

Developing an interactive interface

We knew from the outset that we would need a way to communicate outputs to users, and therefore a data visualisation would need to be developed. For this to be useful, then as a minimum it would need to be intuitive, interactive and also present data spatially. My mentor has been helping me build such a tool in JavaScript.

Working with others from government

For me, the real strength of the Accelerator programme is the opportunity to work with and receive advice from data scientists from across government, as well as collaborate with peers in similar roles in local government. We all seem to be facing the same challenges.

In addition to help from my colleagues within the Council, I have also received support from the Office for National Statistics and GeoPlace, who have provided access to invaluable data and expertise.

This Accelerator cohort graduates on the 4th July.

Embedding an ethical approach to underpin our data science projects

We’re lucky in Hackney – in 2018, our Data & Insight function has grown in both number and scope, and we’re one of only a few local authorities to employ a permanent data science resource. Our data scientist works closely with the rest of the team, whose overall focus is on joining and analysing the huge range of data we hold, to help services better meet the needs of our residents. The talent and skills of the team, combined with the vision of our ICT leadership, which challenges us to look at the same problems in radically different ways, offers no small opportunity.

The private sector has led the way in practically employing data science techniques, harnessing vast swathes of information on customers and their spending habits to maximise revenues. For example, we’ve seen companies like Amazon use machine learning to persuade shoppers to part with extra cash at the online checkout by showcasing related products. The public sector has lagged behind, in part because of a lack of investment in the necessary skills but also due to the longstanding central government focus on data being used primarily for retrospective reporting. This has limited our ability to use our knowledge  – about our residents and how they interact with services – more creatively. Shifting the focus to predictive analysis could help us change the way we support people in future, to help us deliver better services at lower cost.

We want to replicate the success of the private sector in leveraging the vast volumes of data we hold as an asset to improve our service provision. These include the opportunity to prevent homelessness; better targeting of resources to assess social care cases that require most urgent attention; improved customer service by channeling users to other services they may interested in as they transact with us; or tackling fraud, to name a few.

While opportunity abounds, we face a unique challenge in meeting the expectations of our residents who hold us to a much higher standard than private companies, when it comes to handling their data. Many local government data teams are starting to work on predictive models but we know system bias is a concern to the public. How can we trust the results of predictive algorithms that have been built on data which may be limited, or only reflect how a long established service has traditionally engaged with specific sections of our community?

If we are to successfully implement forward-looking predictive analytics models, we have to build trust with our citizens: to ensure that they understand our motivations, and can transparently assess how we work with data.

The approach we’re taking:

From the outset, we’ve been alert to the need to test and build an ethical approach to our data science work, which is still in its infancy.

Building on examples we’ve seen elsewhere, we’ve developed a Data Ethics Framework which nestles alongside our Privacy Impact Assessment (PIA) process in Hackney, to make sure that for every project we undertake, we’re able to clearly articulate that we’re using a proportionate amount of data to enable us to draw robust conclusions.

At each stage of our 5 step project cycle, we stop to consider:

– Is our data usage proportionate and sufficient to meet user need?

– Are we using data legally, securely and anonymously?

– Are we using data in a robust, transparent and accountable way?

– Are we embedding and communicating responsible data usage?

One of the most important checks and balances on our work will come from an increasing focus on how we embed responsible use of our findings. Applying data science methods to our large volumes of data offers huge opportunities to provide a positive impact for residents, but we know there are risks if outputs are misinterpreted. We’re trying to mitigate against this by developing our team’s skills to communicate our findings in the simplest way possible so that local government officers don’t need to become expert data analysts to responsibly make use of this work. Democratising access to data and providing everyone with the skills or tools they need to make sense of information has to be the right approach.

We’re taking small steps every day to improve our skills and maximise the benefit of the opportunity we have in Hackney. We’re learning from others – notably the gov.uk Data Ethics Workbook which inspired our approach and trying to embed in a simple and proportionate way. The key for us is balance; we’ve tried to streamline this into a simple process with sufficient rigour to build confidence in the ethics of our work without unnecessarily slowing down our ability to experiment. We’re keen to open out the conversation and hear from other public sector organisations who are beginning to unpick this sticky issue.

We also recognise that to truly build trust with our citizens on how and when we use their data, we need to openly engage with people. We’re thinking about how best to start a conversation with residents so we can hear their concerns, discuss the risks and opportunities and agree a way forward, together.